A Busy, Warm April

24.04.2010

April has been one hectic month! Between launching WhatGrowersUse.com, pre-launch marketing Optimise and fixing PlantConnection, I’ve barely had a moment to stop and think!

Add to this the fact that I’ve been swotting up on all things dev. I love coding in PHP and today I managed to hack out a super slender piece of functionality for an awesome *top secret* project I’m working on!

I basically need to pull a massive XML feed (think >10MB) and process the contents into a database. So this is grand-scale processing of huge wads of XML data – something I’ve never really had to do before.

This may seem a bit backwards, but moving away from the XML is imperative. This project needs to be fast and agile. I don’t want to have to allocate huge chunks of memory to my whole PHP layer just to deal with a few records in various, disparate XML files.

Starting with SimpleXML

SimpleXML is a fantastic XML parser. Being able to turn XML into native PHP objects, arrays and variables is supremely handy. However, “Simple” is definitely the operative word! When it comes to huge amounts of XML data, SimpleXML just doesn’t cut it!

So in my search for an alternative, I came across something I’ve never used before: a little-known – and it seems little-used – PHP extension called XMLReader. My hero.

XMLReader saves the day

XMLReader is what is known as a pull parser… you grab what you need as the parser races through the XML at lightning speed. It streams the file into context too: reading, caching, splurging… so you can deal with the XML before the whole file is loaded into memory.

This is a huge advantage for massive files. As long as the XML is valid, you can run away processing elements and attributes at a blistering pace.

Something to remember though is that doing it this way is no good if you then try to process all of this data in memory intensive variables or objects (which is the problem with SimpleXML). You either face increasing the memory allocation to PHP (which is limited by how much RAM you have) or you find a quick way to deal with the data and move on.

It’s grab-and-release… no time for processing here. Any kind of re-assignment of data where volumes exceed 30,000 reps of a while-loop will suck your memory dry. So no massive strings, no huge arrays and definitely no objects!

Execution Time

The only issue left is execution time. Between parsing the XML file  and saving it to a local MySQL db, your script could take a lot longer than your default execution time.

XMLReader is pretty darn snappy and there’s really no way to improve that with digging into some C code and rebuilding the extension. The code that actually does stuff with the data from the XML is very minimal. The biggest challenge is time writing to the database.

I’m using MySQL. The quickest method of writing to a database (save reformatting the data into some kind delimited text file and using load_data_infile) is mysqli prepared statements. This greatly reduces database load and running thousands (or hundreds of thousands) of queries can be done in mere seconds.

However, even this will be too slow if your max_execution time is 30 seconds. But the only real way to speed MySQL up is a faster processor and improved disk-write speed (think SSDs). Those are expensive options.

The simplest option? Increase script execution time. If this isn’t a user page, you can be a little more relaxed on timeouts. You should be able to use the set_time_limit() function to increase execution time. In fact this function has a handy habit of simply extending your current execution time by the limit you set.

<php
  while ( $xmlReader->read() ) {
    set_time_limit(2);
    // ... get data, save to DB etc ...
  }
?>

This will give each loop an extra two seconds to execute, which should be more than enough time to parse an extra few XML nodes and execute a prepared statement one more time.

So there it is. If you’d like to see the full code or have any questions, just ask, I’d be happy to share it. A huge thanks goes to Chad Fennell for his excellent post on XMLReader

Say ‘No’ to Expensive SEO

10.03.2010

I was recently invited to the NEC to give a 20-minute seminar to those gathered for the annual PROMOTA trade show.
Read the rest…

I Learned Something Yesterday

24.11.2009

I met three ace guys yesterday: Sean Leigh, Jeremy Harding and Alan Mann. They are all techy/web entrepreneurs and are making big bucks out of big ideas. They know some crazy people in the domaining field.

It turns out that there a lot of wealthy people doing domaining. I found out yesterday that there are some people out there that own thousands of domains with no websites attached to them! Craziness! They obviously have a lot of money.

The reason I met these great guys though is because they found me and this brings me to a little tip: Get your name out there!

It turns out they found me through a service I signed up for and hardly use, Elance. The data from Elance was probably shared on a network of other services, which they stumbled across. My profile on this service is sketchy at best. But it was enough of a lead for them to do more digging.

Because I’m an active participant on a number of blogs, forums, Twitter etc. they managed to fill in the gaps pretty quick.

So if you want to be found, get active in your community! A word of warning though: Be prepared for a grilling (i.e. bring something amazing along).

Putting My DBA Hat On… Again

06.10.2009

It’s not often I have to worry too much about the minutia of database administration… well, I try not to. But this question on StackOverflow got me intrigued, so I put on my trilby.

labratmatt was having a bit of a problem with inserting data into a MySQL table with field defined as DECIMAL(3,2). Can you guess what his problem was? That’s right…. 9.99! How did you guess?
This has got to be one of the most popular MySQL-related Google searches. The initial problem is easy to solve… correct the presumptuous field definition.
However, the underlying problem is really why his data was being truncated, even inserted incorrectly. You may notice the same if you run MySQL (v5.0+) from a default setup on other field types: VARCHAR for example, where you set a maximum field length. When you INSERT data that is too long it simply gets truncated.
Not hugely worrying you may think, especially in development and testing phases. True. But this wasn’t enough for me, so I went on a hunt.
I found this interesting article by Robin Schumacher on MySQL Data Integrity.
It seems that there is a configuration variable in MySQL (v5.0+) called ’sql_mode’ that determines exactly how strict MySQL should be when writing data to tables. The problem is that, by default, it’s unset, which means MySQL uses its standard mode… fudged SQL.
It has a vast array of options, so read through and choose wisely.
The default MySQL setup essentially turns all of your INSERT and UPDATE statements into INSERT/UPDATE IGNORE statements. It is an unexpected “gotcha” for many… any self-respecting software developer would want the INSERT query to fail and for the DBMS to tell you why it failed, not automatically munge the data for you.
To achieve this, the general option to use for ’sql_mode’ is STRICT_ALL_TABLES… but even this has some gotchas (VARCHAR and TEXT expect only string values etc…) and may need to be combined with other options.
Of course, if you write your programs to send MySQL the correct datatypes, changing this option shouldn’t cause any problems :)
The annoying thing is that I’ve only just found about this now after nearly 6 years of database development.

Simple Usability Testing

23.09.2009
Silverback by ClearLeft is a wonderful usability testing application that makes it dead simple to record and manage your test sessions.
If like me though you can’t afford to shell out $50 right now, here’s a way to do some bargain basement usability recording.
All you will need is an Intel Macbook/Macbook Pro running Mac OS 10.6. This comes with Quicktime X, which now has screen, audio and movie recording built — so it can record what’s happening on the screen, video from a webcam and audio through your built-in microphone.
Ok so it’s not quite as polished as Silverback, but it works. Unfortunately, because you can’t start both the movie recording and the screen capturing at the same time, they will be out of sync for a few seconds.
If you have some video editing software that allows you to match them up and overlay (iMovie doesn’t do this yet), a quick bit of editing will go a long way when you need to review those test sessions!
The way I do it is:
  1. Start Movie recording (this is the webcam)
  2. Minimise the live preview
  3. Start Screen recording
At the end of the session:
  1. Stop Screen recording
  2. Stop Movie recording
If anyone knows of a way to get the two to start at the same time, that would be great (an AppleScript would probably do it, but I’m a virgin at it). Also, can you recommend any good cheap/free video editing tools that can handle movie on movie overlays?

Dropbox on Ubuntu Server

11.08.2009

In our office, we have a small custom, headless 32-bit PC running Ubuntu Server 8.10 (Intrepid Ibex). It’s ideally suited as our testing platform web server, file server, SVN server… well you get the picture.

I’ve been trying to set up a VPN through a Linksys-Cisco router we purchased (WRVS4400N), but have hit one snag after another (thanks to Cisco’s non-support of anything other than Windows).
Then it hit me: use Dropbox!
A few free Dropbox accounts is all we will need for now between us and it creates instant versioning and backups of all of our critical files – something we weren’t doing properly up until now – plus allowing us to interact with the file system locally rather than over the network.
A super idea!
Problem 1: Dropbox is not officially supported for command-line-only Linux distros. Thankfully though some nice people have put together a few handy instructions and scripts in order to make it work.
Problem 2: This installed Dropbox in a location that I didn’t want. Our server has a partitioned drive for security reasons. So all of our day-to-day files exist on one partition and the system files reside on the main partition. Dropbox was installed in the user folder I used when performing the install, which is in the system partition.
Without wanting to mess around too much trying to remove the current Dropbox install and then fiddle with Python code (which I have absolutely no experience with), I needed a quick method for getting some of our working files into the Dropbox folder in the system partition.
It turns out this is where Linux is super handy! Using standard symlinks to the folders in question was the perfect solution. Dropbox sees these as actual folders and synchronises across the link – up and down… meaning the files stay on the right partition, but now appear as part of the shared folder I wanted them in!
Win!

Overcoming Professional Prejudice

21.04.2009

I’ve just been on the phone with a potential client. From our brief chat it’s clear that they have experienced problems with “web” people before now. This has affected their view of our services at FlipStorm, even though they know nothing about us.

So, how do you overcome that kind of prejudice? You could turn to the salesman’s pitch… tell them all the guff they either already know or don’t want to know and spin it to make it sound like you’re the best. If they buy in, they’ve got to spend some more money and they might just get lucky.
If they are smart though (and your client is always smart, no matter how stupid they are!), they won’t go for any of that. So you need to toss them a bone. Prove to them that you are prepared to go that extra mile. Give them something for nothing… a favour!
Some of you may see this as flaring up the spec work debate, but before I start a urinating competition, I’d like to mention that there are absolutely no limits on how far you take this; it’s entirely up to you, if you think it will achieve the desired result without costing you too much. If you make it clear to the client that this is a gimme and that any work as follow-up from that will be payable then you’re in no danger of giving false impressions or cheapening your services.
Quite the opposite, in fact; it adds value to your services. It could be a deal-maker and something so simple to you that it takes you all of 10 minutes. Those 10 minutes are definitely worth a new customer!
Ah but, I hear you say, will that be a quality client? That depends largely on how strict you are with your freebies. Too much and clients get used to it, expect and eventually demand it.
We will have to wait and see if it pays off in this case, but I have found it to be genuinely worthwhile.

7 Steps to Reach the G-Spot

18.04.2009

By G-Spot I mean the first page of search engine results in Google. A couple of months ago I sent out this email to a friend of mine who asked me to analyse his website (Damian Brown Photography).

It’s quite a specific analysis of his site, but it can be used as a basic framework for most sites out there:
First off, page titles (as i call it, the
This is pretty key. Most search engines use this as the heading for the search result listing, the link that you click to go to the desired site after performing a search. This is one of the primary places a search engine will look for keywords. However, it shouldn’t be too long as it will get clipped/truncated and it should make some sort of sense. I know this may seem obvious, but there should only be one tag on the page and it should always be inside the section of the page.

META tags:
Right, to cut through all of the confusion, the only ones you really really really need are the description and the content-type ones. The description should be different for each page and should be no more than one intelligible paragraph about the contents of that page and if possible not just a paragraph that is already written on the page.

The content-type is a little more confusing, but suffice to say as long as it looks like this on every page of yoursite, you’re ok.

Of course, on other sites, this needs to be considered carefully. Web browsers use a number of methods for determining the correct content-type of the document and if they’re mismatched, you may end up with the wrong one and certain characters will come out with extra glyphs, especially if you don’t use ANSII code for special characters (e.g. &123;).

The rest of the META tags aren’t overly used and in the case of the keywords one, ignored altogether. Any META tags should appear inside the [in here] tags.

Valid HTML:
This is extremely important to search engines. Clean code means it’s easier for them to read your site and suggests that it will render well in the browser, which you’ll score brownie points for. Code that isn’t where it should be will confuse the search engine algorithms and they may even give up indexing your site completely until it’s sorted.

This is a difficult one to achieve as there’s a lot that goes into this. It comes down to having a good basic design and sticking to it. One thing I will say: make sure there is no code or content floating around in between the closing tag and the opening tag or after the closing tag (except for the closing tag).

Headings

through

:
Headings are also really important. If you think about the basics of print for a minute (this is where all this comes from anyway): When you open a book it has an index giving you a quick glance at all of the chapter headings. If you go to a chapter, you see its title in large, bold text at the top of the page. Then the content relevant to that subject is placed underneath and is generally organised by subheadings and paragraphs. This is so we can follow the train of thought without getting lost and easily pick up where we were if we do.

If we apply this principal to the web, it becomes very natural, but also meets some requirements of the search engines. So having a main heading on each page (the

tag, there should only be one of these per page) that re-iterates the title of the page and then structuring any text into paragraphs of single thoughts, just like you learned in English lessons, will go a long way to improving not only the ease of reading from a visitors point of view, but also the search engines.

Flash:
For the most part search engines can’t read Flash content. A search engine basically sees what you would see if you did a “View Source” in your web browser. They use the text they see to determine what the page is about, how relevant and up to date it is etc etc. If that text is in Flash it won’t see it. If there’s any major bulk of text in a Flash file that plays on your site, it needs to come out and sit on the page somehow.

I don’t think you’ve got this problem as most of the flash you’re using seems to be image galleries, which is fine for the most part. There are alternatives to Flash which could improve your site in this regard, but it’s not essential.

Links:
Firstly navigational links on your site should be clear and steady. By this I mean that as you move from one page to the next, they should stay in the same place. They can also serve as a visual cue as to what page the visitor is on, so links that disappear when you’re on that page can be a little confusing.

Visitors should be able to get to almost any page from any page. So rather than having to leave a trail of breadcrumbs, they can simply see where they were when they read that really interesting part/saw that really good photo.

Secondly, links from other websites. Getting other sites linking to your website is another key from a search engines point of view. But rather than getting hundreds or thousands of links from websites all over, it’s better to have even just a few that are more relevant to your field of expertise. And the more natural the link looks on the other persons/company’s website, the greater the chance that it will improve your ranking. E.g.

Click Here! is not quite as useful to Google as Birmingham photgrapher portfolio or something similar. Can you see why?

If you can encourage people to link to your site or write an article about you or something like that, chances are it will be more natural.

Others:
Some search engines use a simple datafile to help identify pages on your site. It’s called a sitemap XML file. This is a bit complicated and techie, but setting one of these up can complement a well-delivered website and make sure that you tick all the boxes from the search engine’s point of view.

Also content freshness is an area to consider. Although I have found that this doesn’t have to be too dramatic, some changes every now and then help to keep your site on the map so to speak.

The points here are pretty obvious if you’ve been doing SEO for a while. But they need to be monitored to make sure you continue to comply.

Of course, if you’re site is built on a well-written CMS or other standards-compliant platform/framework/application – such as EDDyâ„¢, FlipStorm’s web application development platform – it will tackle most of these steps for you, enforce some others, and encourage you to respect the rest.

Free Web Site Advice

28.01.2009

The web developer and design community is growing super large and because of that it has been an endless source of advice and inspiration to me. As a large thank you gesture I want to offer my professional advice and support in any areas I can.

It may not be worth much in some cases (hence why it’s going to be free), but hopefully it will help. I also want to use it to challenge myself with some new stuff that I’m just not doing enough of!
All designers and developers face challenges. I have faced plenty. I haven’t blogged about all of them yet (nor do I intend to!), but some of the things I’ve accomplished and overcome may be useful to others. I think that sharing that information is a must to the continuing evolution of software development. I also believe that it should be a free service.
So please feel free to leave comments or send me messages somehow (Twitter, email etc.)  asking about anything you would like help with. I will take on almost any challenge related to software – particularly focussed on PHP, Flex, HTML, CSS, Javascript, AIR, RIAs and related topics, but don’t feel tied down to just these ones. Ask away!