A Busy, Warm April

24.04.2010

April has been one hectic month! Between launching WhatGrowersUse.com, pre-launch marketing Optimise and fixing PlantConnection, I’ve barely had a moment to stop and think!

Add to this the fact that I’ve been swotting up on all things dev. I love coding in PHP and today I managed to hack out a super slender piece of functionality for an awesome *top secret* project I’m working on!

I basically need to pull a massive XML feed (think >10MB) and process the contents into a database. So this is grand-scale processing of huge wads of XML data – something I’ve never really had to do before.

This may seem a bit backwards, but moving away from the XML is imperative. This project needs to be fast and agile. I don’t want to have to allocate huge chunks of memory to my whole PHP layer just to deal with a few records in various, disparate XML files.

Starting with SimpleXML

SimpleXML is a fantastic XML parser. Being able to turn XML into native PHP objects, arrays and variables is supremely handy. However, “Simple” is definitely the operative word! When it comes to huge amounts of XML data, SimpleXML just doesn’t cut it!

So in my search for an alternative, I came across something I’ve never used before: a little-known – and it seems little-used – PHP extension called XMLReader. My hero.

XMLReader saves the day

XMLReader is what is known as a pull parser… you grab what you need as the parser races through the XML at lightning speed. It streams the file into context too: reading, caching, splurging… so you can deal with the XML before the whole file is loaded into memory.

This is a huge advantage for massive files. As long as the XML is valid, you can run away processing elements and attributes at a blistering pace.

Something to remember though is that doing it this way is no good if you then try to process all of this data in memory intensive variables or objects (which is the problem with SimpleXML). You either face increasing the memory allocation to PHP (which is limited by how much RAM you have) or you find a quick way to deal with the data and move on.

It’s grab-and-release… no time for processing here. Any kind of re-assignment of data where volumes exceed 30,000 reps of a while-loop will suck your memory dry. So no massive strings, no huge arrays and definitely no objects!

Execution Time

The only issue left is execution time. Between parsing the XML file  and saving it to a local MySQL db, your script could take a lot longer than your default execution time.

XMLReader is pretty darn snappy and there’s really no way to improve that with digging into some C code and rebuilding the extension. The code that actually does stuff with the data from the XML is very minimal. The biggest challenge is time writing to the database.

I’m using MySQL. The quickest method of writing to a database (save reformatting the data into some kind delimited text file and using load_data_infile) is mysqli prepared statements. This greatly reduces database load and running thousands (or hundreds of thousands) of queries can be done in mere seconds.

However, even this will be too slow if your max_execution time is 30 seconds. But the only real way to speed MySQL up is a faster processor and improved disk-write speed (think SSDs). Those are expensive options.

The simplest option? Increase script execution time. If this isn’t a user page, you can be a little more relaxed on timeouts. You should be able to use the set_time_limit() function to increase execution time. In fact this function has a handy habit of simply extending your current execution time by the limit you set.

<php
  while ( $xmlReader->read() ) {
    set_time_limit(2);
    // ... get data, save to DB etc ...
  }
?>

This will give each loop an extra two seconds to execute, which should be more than enough time to parse an extra few XML nodes and execute a prepared statement one more time.

So there it is. If you’d like to see the full code or have any questions, just ask, I’d be happy to share it. A huge thanks goes to Chad Fennell for his excellent post on XMLReader

Wouldn’t It Be Great If All APIs Followed RESTful Principles?

02.12.2009

APIs and connecting web applications together is going to be the next challenge of the evolution of the web. The next decade should see easier-to-implement, yet more secure methods for connecting the various web applications that we use.
Read the rest…

Announcing “reactor”

13.11.2009

Ok, so it’s not a big announcement because the app hasn’t actually launched yet. But I’ve launched the holding site.
Read the rest…

Putting My DBA Hat On… Again

06.10.2009

It’s not often I have to worry too much about the minutia of database administration… well, I try not to. But this question on StackOverflow got me intrigued, so I put on my trilby.

labratmatt was having a bit of a problem with inserting data into a MySQL table with field defined as DECIMAL(3,2). Can you guess what his problem was? That’s right…. 9.99! How did you guess?
This has got to be one of the most popular MySQL-related Google searches. The initial problem is easy to solve… correct the presumptuous field definition.
However, the underlying problem is really why his data was being truncated, even inserted incorrectly. You may notice the same if you run MySQL (v5.0+) from a default setup on other field types: VARCHAR for example, where you set a maximum field length. When you INSERT data that is too long it simply gets truncated.
Not hugely worrying you may think, especially in development and testing phases. True. But this wasn’t enough for me, so I went on a hunt.
I found this interesting article by Robin Schumacher on MySQL Data Integrity.
It seems that there is a configuration variable in MySQL (v5.0+) called ’sql_mode’ that determines exactly how strict MySQL should be when writing data to tables. The problem is that, by default, it’s unset, which means MySQL uses its standard mode… fudged SQL.
It has a vast array of options, so read through and choose wisely.
The default MySQL setup essentially turns all of your INSERT and UPDATE statements into INSERT/UPDATE IGNORE statements. It is an unexpected “gotcha” for many… any self-respecting software developer would want the INSERT query to fail and for the DBMS to tell you why it failed, not automatically munge the data for you.
To achieve this, the general option to use for ’sql_mode’ is STRICT_ALL_TABLES… but even this has some gotchas (VARCHAR and TEXT expect only string values etc…) and may need to be combined with other options.
Of course, if you write your programs to send MySQL the correct datatypes, changing this option shouldn’t cause any problems :)
The annoying thing is that I’ve only just found about this now after nearly 6 years of database development.

HTML 5, XHTML 2 – Web 2.5

28.12.2008

I’ve been doing a lot of reading up on HTML5 and XHTML2. I know neither of these technologies are anywhere near well-supported enough to start using in production. However, we should all be starting to get our heads around the changes – if not only to be ready for the shift, but also the benefits it will bring.

There has been a lot of hype in design and developer circles for a good few years surrounding all of this, especially for proponents of the so-called semantic web – the supposed natural evolution of the web.
However, Tim Berners-Lee (seen by many as the father of the web as we know it today) has already suggested that this semantic web will only make up part of what we will come to call Web 3.0. How much of it will be the semantic web is yet to be seen (if much at all given it’s progress!)… more to the point, how much of an impact these impending technologies will have on the semantic web is a little hard to judge.
It strikes me that whenever we reach this next phase in syntactical changes for the web as we know it – in terms of it being an officially approved and ratified recommendation by the powers that be (some time around 2012) – and the point in time when it can be considered as globally adopted – probably within 4-to-5 years following that recommendation, similar to that of XHTML1.0 – will be half an evolutionary cycle.
If it does bring us anywhere close to the intended semantic nature of the web, it will, at best, be only half way there. So I’m going to go out on a limb and predict that sometime in 2015 we will confidently say we have reached Web 2.5.
Of course this is assuming we’re still here in a fashion. And that this stuff moves on apace. With the current fairly good awareness of standards and best practice, I believe that a small nudge from some big players may impact things for the better. Say Google adjusts various search algorithms to favour HTML5 sites in search listings… we all know that clients will notice and designers and developers will have to pay heed!
So it really is best to start now. Most of the changes (particularly towards HTML5) aren’t major. In fact, as can be seen, they should simplify our lives an awful lot! The problem is that there’s a lot of web out there to change. You can’t just change tags and roll – the implications are far greater: you have to consider CSS, the impact on any server side scripting used, browser rendering and their differences (especially for the new controls)…
This all sounds a little painful. For those using any kind of Web App platform with a good templating structure, this should be fairly easy: set up a new HTML5 template. The only complication to consider then is script-generated mark-up. And that should be tackled by the vendor.
Of course this is somewhat subjective as browser support is sketchy at best, so it’s hard to test any development in this area. Looking forward, we should be seeing greater support of these technologies in coming months. For now it’s probably best to glean what you can from the WHATWG and if you’re a developer building a CMS or other Web App you can probably start writing some test cases and replacement libraries just to stay ahead of the curve.

CMSs != Web Apps? WTF?

23.12.2008


Well it’s coming to the end of 2008 and there’s nothing left to say, but:

Why Are CMSs Not Included In The “Web Apps” Balloon?

There was a time when going up into the sky was every boys (and some girls) dream. To be like a bird, fluttering at high altitudes in the cold, thin air… so naive…
Sorry… rambling again. My point is that mankind poured decades of research into flight, spanning many centuries. IMO, one of the greatest achievements in the field is the hot air balloon (although I’ve never been in one).
As I see it, the hot air balloon is a brilliant and exhilarating contraption. But to ride one with any sense of control (and I am a control freak) takes an awful lot of know how. You have to be a pretty cool kid to guide a hot air balloon with any safety.
The same is true with web apps. They’re a wild giant canvas filled with the piping hot air of incidence and popularity, warily directed by the most aimless sense of direction conceived (in some cases). But it seems the basket is too full for you, you and you!
That’s right, you’re not a web app, soaring among the hilltops, if you’re a CMS or other such ilk. However, I contest this. A CMS and a web app are interchangeable. Henceforth, I will no longer think in terms of a CMS. All such things will be called web applications – plus I think CMS is really formal and business-y and I don’t like that, not one bit.
So let us all jump aboard the web app balloon and bring it swiftly earthward!
On a slightly less diabolical note: 24ways has been brilliant this year. There have been some terrific advancements this year and I can’t wait to see what 2009 brings! Let’s go together.
END;

__get, __set, __construct and MySQL_Result::fetch_object()

16.12.2008

I realise this is my first blog post in a long while. Despite the fact that I want to make it interesting, I am terribly busy. However I needed to get this off my chest as it has been bugging me for ages (hopefully it will help some of you!).

While researching faster methods for data retrieval and manipulation for my *TOP SECRET PROJECT* I found that, as of a very recent version of PHP, the MySQL_Result class’ method ‘fetch_object’ allows you to specify a class to dump the object into.

My first thoughts were: fantastic! What a better method than what I was doing previously. I can instantiate an object and set all the properties all of the back of the query – dynamic properties while still inheriting all methods and hard-coded properties… excellent!

As I continued down this route I found some unusual behaviour though. First of all, it appears that a call to $result->fetch_object(‘Class’) makes use of the magic method __set() *EVEN WITHOUT __set() BEING DEFINED IN THE CLASS*. Now there’s no note of that in the docs!

So what happens when I do define a __set() method? Interestingly $result->fetch_object(‘Class’) then uses the __set() method I define. Good, it behaves. But then I found something else. A little irritating “bug” is that, upon calling $result->fetch_object(‘Class’), it constructs the object in an unusual order.

Perhaps it’s just me and my thought process, but I would’ve gone for:

  • Check if __construct() is defined
  • Instantiate object
  • Check if __set() is defined
  • Set properties

However the actual process order is:

  • Check if __set() is defined
  • Set properties
  • Check if __construct() is defined
  • Instantiate object

This threw me a little, especially as there’s little documentation detailing the behaviour of MySQL_Result::fetch_object() when overloading with a class name. It does make some sense, but it’s not the human way of doing it (if my way is human :P ). I tend to think like ‘build box, put toys in’, rather than ‘pile toys up, build box around toys,’ but that’s me.

But now I’ve sorted it and it’s out there! Let me know if that helps you.

Whine, whine, whine… That’s all you do!

12.11.2008

We (and by we I mean the company I work for) launched one of the most interesting web sites we’ve produced to date.

ComplaintCommunity™ is a brilliant brain-child of Neil Gleeson (our client), that has finally been realised by the support of a very talented developer, Ben Gilkes, along with a smart design by Elaine Haywood – our superb in-house designer.
ComplaintCommunity is set to turn social networking back into community spirit and ultimately consumer power. Neil is an expert on these things and has given so much thought into the project; he knew what he wanted to get, and Ben knew how to deliver. A superb client – plus he always brought us blueberry AND chocolate muffins!
This brings to mind Paul Boag’s recent post about client relationships. I can’t say that any relationship is perfect, but it looks like this was a fairly good example. Evidence of this is how much behind-the-project Neil is and how excited our team is to hear about its growth.
Check out ComplaintCommunity.com, it is still developing so as it grows expect improvements and developments. Also, check out where I work: we made ComplaintCommunity.com.

Progressive Ehancement, Graceful Degradation and Legacy Support

30.10.2008

Are we still supporting browsers that have had their day? It seems the simple answer is “hell yes!” I ask why…
Read the rest…

Zend Studio 6.1, TinyMCE, and Scotland

29.09.2008

Again I find my blog post titles not really resembling any coherence… if you follow the TWiT podcast you’ll notice my titles are starting to follow the ethos: it’s more a summary of what’s covered than an introduction… Still it works for now.

I haven’t posted in an age and three-quarters because I went away to Scotland for a week during September. I’ve had difficulty keeping up with all the stuff that’s happening in the world of late. It seems that financial markets are going to pot. So for the time being I will be keeping my head down trying to secure my very bleak future. I suggest you do the same.
In slightly lighter news, Zend released Framework 1.6 and Studio 6.1. Also I noticed that my favourite, open-source, JavaScript-based WYSIWYG editor, TinyMCE, has recently been updated.
Back to work!