jQTouch: Building Web Apps That Work Everywhere

08.06.2010

We’re running a free workshop at our offices next Monday starting at 11am for developers (and designers) interested in building web apps that work on mobile devices. It’s free to come along, just get your name on the list before places fill up.
Read the rest…

Free Networking Event & Win an iPod Touch

17.05.2010

We love events. And we love giving away free stuff for doing nothing. So come along to the free networking event next week to say hi, then join our Facebook group and win an iPod. Read the rest…

We’re Hiring!

13.05.2010

FlipStorm is growing and now we’re looking for a web developer to join us! Read the rest…

A Busy, Warm April

24.04.2010

April has been one hectic month! Between launching WhatGrowersUse.com, pre-launch marketing Optimise and fixing PlantConnection, I’ve barely had a moment to stop and think!

Add to this the fact that I’ve been swotting up on all things dev. I love coding in PHP and today I managed to hack out a super slender piece of functionality for an awesome *top secret* project I’m working on!

I basically need to pull a massive XML feed (think >10MB) and process the contents into a database. So this is grand-scale processing of huge wads of XML data – something I’ve never really had to do before.

This may seem a bit backwards, but moving away from the XML is imperative. This project needs to be fast and agile. I don’t want to have to allocate huge chunks of memory to my whole PHP layer just to deal with a few records in various, disparate XML files.

Starting with SimpleXML

SimpleXML is a fantastic XML parser. Being able to turn XML into native PHP objects, arrays and variables is supremely handy. However, “Simple” is definitely the operative word! When it comes to huge amounts of XML data, SimpleXML just doesn’t cut it!

So in my search for an alternative, I came across something I’ve never used before: a little-known – and it seems little-used – PHP extension called XMLReader. My hero.

XMLReader saves the day

XMLReader is what is known as a pull parser… you grab what you need as the parser races through the XML at lightning speed. It streams the file into context too: reading, caching, splurging… so you can deal with the XML before the whole file is loaded into memory.

This is a huge advantage for massive files. As long as the XML is valid, you can run away processing elements and attributes at a blistering pace.

Something to remember though is that doing it this way is no good if you then try to process all of this data in memory intensive variables or objects (which is the problem with SimpleXML). You either face increasing the memory allocation to PHP (which is limited by how much RAM you have) or you find a quick way to deal with the data and move on.

It’s grab-and-release… no time for processing here. Any kind of re-assignment of data where volumes exceed 30,000 reps of a while-loop will suck your memory dry. So no massive strings, no huge arrays and definitely no objects!

Execution Time

The only issue left is execution time. Between parsing the XML fileĀ  and saving it to a local MySQL db, your script could take a lot longer than your default execution time.

XMLReader is pretty darn snappy and there’s really no way to improve that with digging into some C code and rebuilding the extension. The code that actually does stuff with the data from the XML is very minimal. The biggest challenge is time writing to the database.

I’m using MySQL. The quickest method of writing to a database (save reformatting the data into some kind delimited text file and using load_data_infile) is mysqli prepared statements. This greatly reduces database load and running thousands (or hundreds of thousands) of queries can be done in mere seconds.

However, even this will be too slow if your max_execution time is 30 seconds. But the only real way to speed MySQL up is a faster processor and improved disk-write speed (think SSDs). Those are expensive options.

The simplest option? Increase script execution time. If this isn’t a user page, you can be a little more relaxed on timeouts. You should be able to use the set_time_limit() function to increase execution time. In fact this function has a handy habit of simply extending your current execution time by the limit you set.

<php
  while ( $xmlReader->read() ) {
    set_time_limit(2);
    // ... get data, save to DB etc ...
  }
?>

This will give each loop an extra two seconds to execute, which should be more than enough time to parse an extra few XML nodes and execute a prepared statement one more time.

So there it is. If you’d like to see the full code or have any questions, just ask, I’d be happy to share it. A huge thanks goes to Chad Fennell for his excellent post on XMLReader

The Blog Run

22.03.2010

It’s Monday morning and that means two things: 1) sorting out my to-dos, and 2) the blog run – catching up on all of the latest news and articles and boiling it all down into a sweet syrup to fill a tasty pastry, web news pop-tart.
Read the rest…

NorthScale – Data Elasticity for Growing Web Applications

16.03.2010

We were really pleased to help NorthScale Inc. finish the public launch of their website. This morning TechCrunch wrote a brief review about NorthScale that’s already attracting major attention.
Read the rest…

Say ‘No’ to Expensive SEO

10.03.2010

I was recently invited to the NEC to give a 20-minute seminar to those gathered for the annual PROMOTA trade show.
Read the rest…

The Partnership

09.03.2010

We recently did some work on the relaunch of the website of Midlands-based design agency The Partnership. We were pleased to help them get their new site up and running.
Read the rest…

When I Get an iPhone and an iPad

06.03.2010

I still don’t have an iPhone. I’ve wanted one from the moment it was released, but I think I’m just mistiming my upgrades… I plumped for the iPod touch and it has been ok for me for a while. I will get an iPhone this year (I need a smartphone). But I also want to give the iPad a spin. So what do I want if I’m using both devices?
Read the rest…

My Favourite Mac Apps – So Far

03.03.2010

I keep meaning to do a list of cool things I have found recently. I always said when I bought my Mac that I would document my experience… I have managed to blog about certain things, but not always the ones I wanted to.
Read the rest…