A Busy, Warm April

24.04.2010

April has been one hectic month! Between launching WhatGrowersUse.com, pre-launch marketing Optimise and fixing PlantConnection, I’ve barely had a moment to stop and think!

Add to this the fact that I’ve been swotting up on all things dev. I love coding in PHP and today I managed to hack out a super slender piece of functionality for an awesome *top secret* project I’m working on!

I basically need to pull a massive XML feed (think >10MB) and process the contents into a database. So this is grand-scale processing of huge wads of XML data – something I’ve never really had to do before.

This may seem a bit backwards, but moving away from the XML is imperative. This project needs to be fast and agile. I don’t want to have to allocate huge chunks of memory to my whole PHP layer just to deal with a few records in various, disparate XML files.

Starting with SimpleXML

SimpleXML is a fantastic XML parser. Being able to turn XML into native PHP objects, arrays and variables is supremely handy. However, “Simple” is definitely the operative word! When it comes to huge amounts of XML data, SimpleXML just doesn’t cut it!

So in my search for an alternative, I came across something I’ve never used before: a little-known – and it seems little-used – PHP extension called XMLReader. My hero.

XMLReader saves the day

XMLReader is what is known as a pull parser… you grab what you need as the parser races through the XML at lightning speed. It streams the file into context too: reading, caching, splurging… so you can deal with the XML before the whole file is loaded into memory.

This is a huge advantage for massive files. As long as the XML is valid, you can run away processing elements and attributes at a blistering pace.

Something to remember though is that doing it this way is no good if you then try to process all of this data in memory intensive variables or objects (which is the problem with SimpleXML). You either face increasing the memory allocation to PHP (which is limited by how much RAM you have) or you find a quick way to deal with the data and move on.

It’s grab-and-release… no time for processing here. Any kind of re-assignment of data where volumes exceed 30,000 reps of a while-loop will suck your memory dry. So no massive strings, no huge arrays and definitely no objects!

Execution Time

The only issue left is execution time. Between parsing the XML file  and saving it to a local MySQL db, your script could take a lot longer than your default execution time.

XMLReader is pretty darn snappy and there’s really no way to improve that with digging into some C code and rebuilding the extension. The code that actually does stuff with the data from the XML is very minimal. The biggest challenge is time writing to the database.

I’m using MySQL. The quickest method of writing to a database (save reformatting the data into some kind delimited text file and using load_data_infile) is mysqli prepared statements. This greatly reduces database load and running thousands (or hundreds of thousands) of queries can be done in mere seconds.

However, even this will be too slow if your max_execution time is 30 seconds. But the only real way to speed MySQL up is a faster processor and improved disk-write speed (think SSDs). Those are expensive options.

The simplest option? Increase script execution time. If this isn’t a user page, you can be a little more relaxed on timeouts. You should be able to use the set_time_limit() function to increase execution time. In fact this function has a handy habit of simply extending your current execution time by the limit you set.

<php
  while ( $xmlReader->read() ) {
    set_time_limit(2);
    // ... get data, save to DB etc ...
  }
?>

This will give each loop an extra two seconds to execute, which should be more than enough time to parse an extra few XML nodes and execute a prepared statement one more time.

So there it is. If you’d like to see the full code or have any questions, just ask, I’d be happy to share it. A huge thanks goes to Chad Fennell for his excellent post on XMLReader

Dawn Ascends! Ready Your Swords!

21.12.2009

I absolutely love coding in PHP. Sometimes I get distracted by the glitz and glamour of some of the more popular languages (and their associated frameworks) – and I agree, they have their place. But PHP is in a class of its own.
Read the rest…

Wouldn’t It Be Great If All APIs Followed RESTful Principles?

02.12.2009

APIs and connecting web applications together is going to be the next challenge of the evolution of the web. The next decade should see easier-to-implement, yet more secure methods for connecting the various web applications that we use.
Read the rest…

Putting My DBA Hat On… Again

06.10.2009

It’s not often I have to worry too much about the minutia of database administration… well, I try not to. But this question on StackOverflow got me intrigued, so I put on my trilby.

labratmatt was having a bit of a problem with inserting data into a MySQL table with field defined as DECIMAL(3,2). Can you guess what his problem was? That’s right…. 9.99! How did you guess?
This has got to be one of the most popular MySQL-related Google searches. The initial problem is easy to solve… correct the presumptuous field definition.
However, the underlying problem is really why his data was being truncated, even inserted incorrectly. You may notice the same if you run MySQL (v5.0+) from a default setup on other field types: VARCHAR for example, where you set a maximum field length. When you INSERT data that is too long it simply gets truncated.
Not hugely worrying you may think, especially in development and testing phases. True. But this wasn’t enough for me, so I went on a hunt.
I found this interesting article by Robin Schumacher on MySQL Data Integrity.
It seems that there is a configuration variable in MySQL (v5.0+) called ’sql_mode’ that determines exactly how strict MySQL should be when writing data to tables. The problem is that, by default, it’s unset, which means MySQL uses its standard mode… fudged SQL.
It has a vast array of options, so read through and choose wisely.
The default MySQL setup essentially turns all of your INSERT and UPDATE statements into INSERT/UPDATE IGNORE statements. It is an unexpected “gotcha” for many… any self-respecting software developer would want the INSERT query to fail and for the DBMS to tell you why it failed, not automatically munge the data for you.
To achieve this, the general option to use for ’sql_mode’ is STRICT_ALL_TABLES… but even this has some gotchas (VARCHAR and TEXT expect only string values etc…) and may need to be combined with other options.
Of course, if you write your programs to send MySQL the correct datatypes, changing this option shouldn’t cause any problems :)
The annoying thing is that I’ve only just found about this now after nearly 6 years of database development.

Enabling PHP extensions on a Mac

14.01.2009

So I hit my first attempt at using something not bundled in Apple’s ’safe’ PHP build for Mac OS X. I decided to implement memcached on my big project. I found out that memcached is pretty easy to set up and had that running in no time.

The problems start when trying to get PHP to talk ‘memcached’. Windows binaries come bundled with the latest stable release of the PECL Memcache library, but Apple has decided not to bundle it into the PHP Apache module.

Then I got thinking: “Well what about other extensions that I might need?”. I resolved to get this working so I knew what to do for future extensions!

I’d read so much about how Apple has disabled extensions and there’s no way to make it work… blah blah blah… Then I found this brilliant tutorial by Erik Giberti.

Stupidly though I didn’t follow the guidelines to the letter (I was probably distracted), but Erik provided continuing support. I Twitter’d him and he seemed more than happy to oblige. One of the good guys!

Upshot… I now have my first working self-compiled extension to PHP loaded and empowering my local development platform. That wasn’t so bad, was it?

For future reference:

  1. Download latest stable extension from PECL
  2. cd Downloads/{library_name-x.x.x}/
  3. phpize
  4. ./configure (may need extra compiler options!)*
  5. make
  6. sudo make install
  7. Update PHP ini with extension=(whatever the installed file is)

You may need to find out what your extension_dir ini variable is (HINT: phpinfo!) as this is where the .so file will need to be. In point 7 just put the file name not the path.

I will no doubt be adding more libraries in the coming weeks. If I get any trouble, I will let you know. Sound off in the comments your issues with PHP extensions in Mac OS X

UPDATE 04/09:
*If you’re running 64-bit architecture (and more than likely have a 64-bit build of PHP) you should run the followingat point 4 instead:
MACOSX_DEPLOYMENT_TARGET=10.5 CFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe -no-cpp-precomp” CCFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe” CXXFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe” LDFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -bind_at_load” ./configure

Unfinished Business

07.07.2008

I just wanted to write a quick post (as it’s late and my Mrs wants me to come to bed) about lingering jobs. We all have them. Washing up that’s starting to grow its own ecosystem, that lightswitch that’s still hanging off the wall (but it works!)…

For me it’s a plethora of oddjobs for old clients. In my case they come back for more, usually at the most inconvenient times. This time however it’s all my fault. In desperation I have offered to do work in the hopes that I will finally get these people off my back.

It’s not that they’re begging me to do things, but more for my own sanity. I guess I have some mild form of OCD. I just want things tidy. Is that so bad?

I find that eventually there are so many things on my mind that I can do none of them. So I suffer silently in a corner somewhere waiting for the light to come on. Ultimately it’s a crippling fear of failure that grips me most. Of course it is that very fear that breeds itself.

I’m sure you understand…

So follow my changing mental state on Twitter. You’ll catch some of my mini rants and you may even witness my entire nervous breakdown (reality TV eat your heart out!)

When “Can I…” or “I’m going to…” becomes “I want you to…”

26.06.2008

Dear Audrie,

Today I have been mostly making “final tweaks” to a fairly simple project that I’ve been working on. It’s a content site with a Flash gallery.
It’s quite easy, but we’re not dealing directly with the client. We’re dealing with a go-between art studio who produced the design. They’ve basically done the whole rebranding and needed someone to put the site together.
This sounds great from a programmer’s point of view: No designing, just programming… more like this please.
There is a major problem with this though. The design studio tend to get their own “ideas” of how the copy should go and what goes where. All within their power? ‘Yes’ you might say.
However, it becomes a little silly when as the programmer you are providing a product to a client that allows them to update their website themselves and you end up making all the changes.
I’m not going to name drop because that may enlighten the client a little too much for the studio’s liking. It may also highlight how this particular studio stays in business.
I know this sounds like I’m just getting at the studio in question, but rather I’m trying to portray just how difficult a programmers life tends to be. You sign up for one thing and do 101 other tasks.
In my experience the following rules apply:
Is it possible to… means … Do that…
Can I…                   means … Do that…
I’m going to…       means … Do that…
And so I find that instead of one person to answer to, I have hundreds all vying desperately for my time. And all increasing in their requests.
Why can’t people tell you all of the things that need doing and just let you get on with it? I much prefer that as opposed to this ‘do this and get back to me’ approach.
Ah well.
Until next time!

iPhone 2.0 Apps Could Alter The Way We Engage

19.06.2008

Dear Judy,

Since the Jobs-note last Monday (9th), a lot of media attention has been focussed on the App Store and the coming features of iPhone 2.0.

Sadly, orders of magnitude more media attention has been focussed on the 3G iPhone – the highly anticipated, but hugely underwhelming revelation that it is.

The iPhone’s new little (big) brother does address some of the more than glaring ommissions that the original was left wanting. And in my view it does finally justify the hefty price tag. But even this has been reduced! To the shock of all!

I shall definitely be getting a 3G iPhone as soon as my existing contract expires (especially as my handset is slowly falling apart).

There are still some issues that I have with the iPhone. These are more correctable with firmware updates than hardware changes, and save for any major improvements in carrier networks’ hardware I don’t see that there’s much need for adding to the tech specs of the 3G iPhone anytime soon.

In my opinion, most gadget ‘freaks’ want the best of all their gizmos in one – the Buddhist (as i like to call them): one-with-everything types. This was (and still is) true of me… to a point. I’ve come to realise that it’s not always best. So I care little about the quality of the camera in the iPhone. It’s sufficient to know that it has a camera.

Little touches are more annoying: like the lack of a light, which I always find no end of uses for, besides for taking photographs. Or a non-removable battery – what the hell? These are about the only physical changes that would make the iPhone my perfect gadget (besides any changes to improve battery life and reduce cost of manufacture whilst improving end-user experience).

The real focus of this post though is the facility that we as end users are all being equipped with: a touch-screen interface. Here MultiTouch isn’t such a big deal. Arguably it does provide a few more natural ways to interact with the operating environment, but it still has its challenges.

However, just having an all-purpose piece of touch-screen kit on the open market (and at a good price) presents some potentially new and exciting ways for us to interact with our other gizmos.

Many are already exploring the possibilities of controlling other devices using the iPhone and iPod Touch. What I would really like to see are practical applications for the majority, not just experimental things that are restricted to specific enterprises. I’m talking about controlling household appliances from my iPhone/iPod Touch.

I realise that it would require that any device that you want to control with your super-phone would need to be connected to the IP network and configured accordingly. But if manufacturers of these devices can be persuaded to see the potential, there’s all sorts of wonderful possibilities.

Aside of my finger-marked status symbol becoming a universal remote control, it could provide interfaces and readouts for numerous appliances.

The real beauty of it is that a touch screen is not limited by what it can display. There’s no fixed set of commands. You don’t have to provide overlays to provide the right input. It’s all customisable and application-specific. It’s intuitive and easy to learn.

The iPhone is the start of many similar devices making their way into our homes. If we can come up with some useful applications it may be the only one we will need!

Google Friend Connect – the answer to a programmers prayers?

22.05.2008

Dear Tanya,

Google Friend Connect was officially announced last week. It has since come to light that this is no ordinary social network. In fact in the truest sense of the term, it isn’t a social network at all!

Friend Connect is a social tool. It seems the lovely people at the big G have been working their socks off finding a way to help us all connect a lot more easily. This fantastic tool is still only in preview, but from the fairly sketchy detail proffered by Google, it seems that even now it has a very wide appeal.

In order to place a perspective on what Friend Connect allows you to do, imagine: you are the only software developer on the planet. You, and you alone, have developed each and every web site in the world and you had to build each site from scratch.

So that means no code sharing. No data sharing. This means for every user who visits a web site that requires authentication they will have to register and verify their details. Then they will have to maintain them. This is the kind of data duplication that would make a data analyst’s butt clench.

By now you would have won plenty of awards… at the very least you would be long overdue some much-needed shut-eye.

The solution would seem obvious: extract the common data structures (in this case the individual member’s details) into a central pool that each site can get access to. That is what Friend Connect does!

You can close your mouth now.

As well as being a social connector, it is also a social enabler. What I mean is that a site that is not a social network (DamnILoveChocoDip.net for example) and never intends to be can still benefit from the viral nature of social networking. The social aspect of visiting a web site and sharing your experience is made even easier.

Yes as the modern social surfer you no longer have to re-register the same details you saved with the previous web site into this one. Simply log in with the same credentials et voila you’re in. Your data isn’t copied or duplicated. You keep the one central piece up to date and Friend Connect does the rest (as they say).

Beautiful! Adding to this is the entire social aspect. In theory (because I haven’t had a chance to test it yet) you can see who of your friends are registered with this site. They will see when you do things on the site (that you explicitly allow).

So all of a sudden, for the majority of web sites we visit, we can now see who else is there. It’s like all of the other patrons visiting this virtual shop/bar/resource centre become visible.

You could actually “bump” into someone who you know from another web site. As you seem to share a passion for at least two things, you might pluck up the courage to strike up a conversation, perhaps opening with an oft-hounded ‘chat-up line’.

It would be the equivalent of walking around the town, bumping into that cute girl in two different shops, suggesting you go for coffee, then quickly cut to 3 years later and you’re story is being told in some horrible romantic comedy starring Tom Hanks.

There are some questions on security and integration, all of which I’m sure will be answered in time. If you’re a programmer like me, I’m sure you will see the benefits though. I’m drooling over some of the possibilities. Least of all not having to build a registration engine every time I build a site. A close second is the speed with which people will discover a new web site.

Also in this orgy of social debauchery is OpenSocial. Now application developers have an opportunity for their applications to make it onto millions of web sites, not just the few major and accessible social networks.

The issue at the moment is that not all networks are supported, and indeed not all will want to jump on the bandwagon. But when things become this easy for users, where do you think the majority will go?

So then the key lessons from all of this are… build a site that can make use of this centralised platform! If you’re building an application, build it for OpenSocial as the chances are it will lead the way in terms of mass integration.

Is this a sign that Facebook may slip into Microsoft’s online “I just don’t get it” pit? I wonder why that could happen.

iPhone, iPod Touch and why Mobile Internet is the Future

22.04.2008

Dear Geraldine,

Yes I am begging for a verbal beating from all the nay-sayers… but Apple have pushed the boat out with full featured web browsing on the mobile platform.

Many non-converts will say that it’s missing some vital features – and they are right. Without Java and Flash on either version of Apple’s uber status symbol, they is missing out on some serious functionality and usability.

Let us not forget though that this is merely the first incarnation of these oft-heckled devices. And many of these issues can be resolved by a “simple” software update.

Why am I so bold? You would be right in thinking I sold out. Yes I bought an iPod Touch. Happy with my current cellular device I couldn’t justify the expense of the iPhone without the features I now expect from a modern handset – especially one with so much (dare I say?) “PDA-like” potential! – What should the next iPhone have?

I’m no analyst, marketer or even a prophet, but Apple have started setting trends again. With the full browsing experience, that will now doubt get even more feature-complete as time goes on and public demand grows, we can now browse the existing web – not having to create a whole new layer of the web to cater for our mobile nature.

This will annoy some to say the least. To all those who have spent their resources on extracting their existing online presence into something that can be neatly accessible to mobile users I say under my breath: “Waste of time”. See Mobile Web, Ubiquitous Web – it’s going to happen if you don’t believe me.

Yes the .mobi domains and re-creation of applications sans Javascript are now not worth pursuing! Why should we when our existing online presence can be viewed almost perfectly on these modern devices?

Forget demographics, usage stats and the like… the iPhone has now set the base standard for how we want to browse the web on the go. Users will demand it. Manufacturers will play to them. Developers must go with the flow!

As a web developer, I have a keen interest in this shift in how users interact with the online community. With the dawn of these devices, our mobile-enabled web applications can now be part of our existing application development model. Requests made to our existing systems will automatically adjust the content to suit the platform.

The creators of Facebook have managed to churn out a modest working example of this for the iPhone and iPod Touch.

Google too are constantly pushing forward with their developments for mobile users. But they have development teams and funding. Although many (including me) are worried about the advertising model that Google has pioneered as to whether it will be maintainable in this new era of web browsing – it is already showing signs of cracking. But this is a topic for further discussion.

What about Joe Small Business Owner – how will he get his web site to be truly accessible in this new arena? You could argue that he doesn’t need to worry because of how these devices manage the current web.

It will become more evident in time though that the web applications that run natively on the platform the user prefers will be the ones that are more favoured and better used.

In order to deliver this in a manageable and cost-effective solution we really need a product that gives its administrators the power to decide what aspects of their online presence will be available to their mobile visitors.

This power doesn’t rest with them at present and thus is forcing a chasm in the mobile web where services either do not exist or are so simple they would be better off if they didn’t exist!

As we move forward into this almost-virgin territory are any of us truly ready?