So, ahh, what have you been doing for the past (almost) two years?

Often I ask myself this very question, or a question quite similar at least, differing only in the period of time that would be, at that time, in question. The question is usually rhetorical, an expression of wonderment at just how little it seemed I had achieved over the stated period of time. This time however, it seems I have an answer.

Did that make any sense at all? Didn’t think so.

Anyway, for a while Twitter was restricting access to full archives of users’ tweets due to apparent database problems. Just the other day I noticed that it seems they’ve restored full access, right back to the beginning.

So, fearing the loss (as I always seem to do) of data that would be irreplicable if lost, I set about backing up my Twitter archive. Yes, it would be ideal if Twitter itself offered a backup utility, but of course they don’t. Most of the third-party backup tools were pretty much all rubbish as well, and I had almost resigned to just save each page by hand.

If it wasn’t for the 100 requests per hour API limit, my method of punching in

curl -O http://twitter.com/statuses/user_timeline/phocks.xml?page=[1-243]

to my MediaTemple shell account would have worked a treat, and it would work for a lot of people who don’t have so many updates. This method also has the advantage of storing the updates in the versatile XML format.

I wanted something more human readable though, and something that I wouldn’t have to wait a few hours to download every time. Then it hit me to try using the HTTP web server instead of the API.

Very crudely I patched together some PHP code — helped a lot by this get_web_page.php script — that simply fetched and displayed each consecutive page of my Twitter archive. I found it was necessary to strip back all the HTML apart from the status updates to prevent Firefox from locking up, but in the end I had a neat little 2.36MB html file with all my twitter updates from the last year-and-a-half-and-a-bit.

Here’s the full html output and the zipped html and the small chunk of alpha code I strung together. Feel free to use it and build upon it. Mainly just a proof of concept, that could be developed in the future. Let me know if you’d like to give us a hand.

Sweet! Now we’re all backed up, in case Twitter’s servers go into meltdown, I’ll still be able to see that around this time last year I was wondering why it always rained on me.

Until next time.

Leave a Reply