HTML5 Briefing Notes

HTML5 has been around for awhile but it seems to have become much more of a force over the last couple of years. The HTML5 Doctor blog has a great overview for a general audience: HTML5 Briefing Notes for Journalists and Analysts

I admit to not being entirely up to speed with HTML5 and related technologies. I’ve tried my hand at some SVG but haven’t delved into CSS3 or any of the game and animation enhancements. I’m pretty interested in expanding my knowledge in this area and I’m hoping to put together some after-hours experimental apps to try some of this stuff out.

I was interested in the discussion at the end of the article about the relationship between HTML5 and native phone apps. It seems like HTML5 is a better learning investment than iOS or Android as it targets all platforms.

Leave a Comment

Wildcard string matching on linux

If you’re on linux, the GNU C library has a handy function for doing wildcard matching called fnmatch. I notice it’s also supported in Python, PHP and Perl and probably in a few other scripting languages.

I’m amazed that I haven’t ever needed to do this before but I had the case this year that I needed to allow a list of strings to contain wildcards. At first I just implemented this myself as I only needed to support the wildcard on the ends of strings but once I started using it, I wanted to allow matching the wildcard in the middle of strings too. That’s the point at which I realised I’d need some kind of stack and parsing and grammar so after 3 seconds on Google, I found fnmatch. It’s times like this you wonder why you didn’t just look this up in the first place.

Leave a Comment

Postgres Table Sizes

Sometimes it’s handy to know the tables in your DB that are taking up the most space. Here’s a quick query that returns the table names and pages used in descending order:


SELECT table_name, relpages
FROM information_schema.TABLES
INNER JOIN pg_class
ON relname=table_name
WHERE table_schema='public'
ORDER BY relpages DESC

 

The reason for the join on the information schema is to limit the results to just the tables you created. If you just use pg_class, then you also get the sizes of indexes and views including the postgres owned objects.

Leave a Comment

Streaming MMS to iPhone

It’s possible to listen to an MMS stream on an iPhone/iPad using the TuneIn Radio app. Read on to find out how it’s done.

Like many hip and in-the-know people around Australia (and indeed the world), I tune in to The Thursday Morning Herald (posterous) with Sam Gunders on Phoenix Radio at 8:45am EST (Thursdays obviously) which is an online only station.

However, due to school drop-off commitments, I’m often driving to work for most of the show. Thankfully, there is a cool new invention called 3G that allows the Internet to be accessed from mobile devices. Also, because I am a hip and in-the-know kind of guy, I have an iPhone and one of those car-stereo mp3 to radio transmitter things that plug into the cigarette lighter aka “12V power outlet” of the car.

The TuneIn Radio app is available in the Apple App Store in two flavours. For this exercise you’ll need the full version which set me back a massive 99c this morning. With the full version of TuneIn installed, you can enter in a custom URL: tap the search bar and press the ‘Custom URL’ button. At this point, you’ll want to double tap the home button and head over to Safari. Browse to the phoenix radio USQ page and tap and hold the “Listen Live” link and choose the ‘Copy Link’ menu option that pops up. Now double tap home over to TuneIn and tap the search box again, you should get a little ‘Paste’ popup button that will allow the MMS URL to be entered into the search box. (Or you can just copy and paste from here: mms://media.usq.edu.au/springfieldradio

From here it’s a simple matter of tapping the ‘Custom stream’ search result which takes you to the player screen. Once on the player screen, you can tap the heart on the top right of the screen to save the radio station for faster access next time.

The ‘connecting’ bit can take a long time. I had to wait five minutes the first time I connected and three minutes later but once it started up, I had no glitches at all and drove from North Ryde to Blacktown along the M2 with uninterrupted hip and grooving Thursday Morning Herald sounds.

Comments (1)

Menus in Gvim under Ubuntu 11.04 (Natty Narwhal)

The text editor Gvim has menus that aren’t compatible with the new Unity window manager in Ubuntu 11.04 (Natty Narwhal). To get around this, you can tell Ubuntu to fall-back to the old Gnome style menus by setting the UBUNTU_MENUPROXY environment variable to null:


UBUNTU_MENUPROXY= gvim filename.ext

 

Leave a Comment

DRBD

DRBD (Distributed Replicated Block Device) is one way to implement warm standby on Linux systems by replicating an entire disk partition over the network. As changes are made to the master server, the secondary servers are kept in sync. If there’s a failure, you can bring up one of the secondaries.

I’ve been playing with DRBD at work today and found it pretty simple to configure using the instructions in The Ubuntu Server Giude

The most hassles I had were that I put data on the partition before I set up DRBD so I had to resize the filesystem to make room for the metadata block that DRBD wants to put at the end of the partition. This was not a big deal either in the end and there is actually an option to put the metadata elsewhere.

We have some scripts here at work that manage bringing up any services that need the drbd partition to be mounted. Note that the mounted device is the /dev/drbdN device, not /dev/sdaN. Unfortunately I can’t publish them here but it is not too difficult to write one. The basic concept is to put the sys V rc style links in a different directory and loop through them once DRBD is online. In our case we have the database on the DRBD partition so we don’t have to muck around with a separate database replication service.

DRBD might even be a decent solution for offsite backups if you have admin rights on both hosts. Just keep everything you need to keep backed up on a special partition and it will sync in the background (It looks like there’s a setting to throttle the bandwidth).

One limitation is that you can’t mount the partition on the secondary drbd nodes. You have to make them primary first and then you can mount them. I see that’s it’s possible to have dual primary nodes though.

Another gotcha seems to be that you must have drbd running to read from the partition so you can’t just grab the backup disk and boot from it.

Leave a Comment

ZeroMQ

There’s a newish IPC framework generating some buzz amongst my colleagues and their networks. I was on the phone to a colleague the other day and he started telling me about a new IPC library he’s using called 0MQ (aka ZeroMQ). It seems to do a lot of smart stuff for you and has obviously generated some enthusiasm amongst its adopters:

How to explain ØMQ? Some of us start by saying all the wonderful things it does. It’s sockets on steroids. It’s like mailboxes with routing. It’s fast! Others try to share their moment of enlightenment, that zap-pow-kaboom satori paradigm-shift moment when it all became obvious. Things just become simpler. Complexity goes away. It opens the mind. Others try to explain by comparison. It’s smaller, simpler, but still looks familiar. Personally, I like to remember why we made ØMQ at all, because that’s most likely where you, the reader, still are today.

Is it worth checking out or is it just another programming fashion / fad? The website is http://www.zeromq.org/.

Leave a Comment

Time Limiting With SQL

When writing SQL queries it’s important to keep in mind the cost of the functions and operators you’re using. For example, using

date_trunc('day', timestamp_column) = '$day'
 
is much slower than using
timestamp_column BETWEEN '$day' AND '$day' + interval '1 day'
 

It may seem obvious but sometimes it’s good to experiment, even doing stupid things just to know what the impacts are.

I was writing a query today to get log data limited to just a certain day. The table in question has a “timestamp without timezone” type column called create_time. I wanted to write a query to give me all the rows that matched a certain day given as a string e.g. ’2011-05-06’. Usually I would use the BETWEEN operator on the create_time column like this:


SELECT blah FROM log
WHERE create_time
    BETWEEN '$day' AND timestamp '$day' + interval '1 day'

 

But I’ve been doing a bit of date munging using date_trunc so the first attempt at hacking up this query I had written:


SELECT blah FROM log
WHERE date_trunc('day', create_time) = '$day'

 

While the second version looks a bit neater, it actually has very crappy performance because it involves doing a conversion before each comparison where as the first version using BETWEEN only has to do a comparison (It does a couple of conversions once at the start of processing to turn the strings into timestamps).

The time difference between these two queries is <1ms vs 700ms BTW so the first version is definitely the right choice.

The next query has got me a little more stumped though: I need to get a list of the days for which data exists. The query I’m using is:


SELECT cast(create_time AS date) AS day FROM log
GROUP BY day
ORDER BY day

 

Which takes over a second to run. I don’t currently have the option of adding an index to this table on this particular server so I just have to put up with the very bad performance of this query. Or is there some trick that will get me a faster result?

Leave a Comment

CodeIgniter Database Memory Use In Long Running Scripts

PHP and associated frameworks aren’t optimised for running long scripts so when your app needs to do a lot of data crunching, there are often memory problems. In this case, I discovered CodeIgniter was keeping all my queries tucked away in its internal structures. Apparently it does this so you can access profiling information when the script finishes.

I was working on my hacky app for analysing a log file today when I loaded in a biggish log file and hit the dreaded memory limit exceeded problem:

PHP Fatal error: Allowed memory size of 33554432 bytes exhausted

I googled and stackoverflowed for PHP memory profiling solutions and pretty much saw only two options: xdebug and the built-in memory_get_usage(). I considered using xdebug but in this case it would mean installing it on a production server and potentially breaking things or else copying some big files to the office and trying to reproduce the problem on our development server which would also be a big hassle. I was sure I could find the issue more easily than that. Using memory_get_usage in a log statement, I was able to get some clues as to what was going on but it’s hard to use this as well because while I could see where the memory was growing, I couldn’t tell if the large arrays were being garbage collected at a later stage (which I’m pretty sure they were).

The way the log file processing works is like this. I iterate through the file a line at a time and grep for markers that let me see what type of log message it is. Depending on the grep result, I call into the appropriate method for parsing the log message and storing the result into the matching array for that log message type. Once any of these arrays gets to a certain number of elements, I call into the method which creates a bulk insert to the DB and clears out the array.

As best as I could tell, the big arrays would have been marked for garbage collection at this point in my method:


$this->tagset_events = array();
/* the former contents of tagset_events should
now be free for garbage collection */


 

After a bit of reading I came across this StackOverflow post which mentioned the database library and that got me thinking about whether CodeIgniter was keeping my queries hanging around and using up all my memory.

As a first experiment, I changed my bulk insert calls to use simple_query() instead of query() and ran my script again. The results were immediately a success! The script finished running and the memory use reported by the profiler was only 3MB as opposed to 32MB. From this I am concluding that CodeIgniter does keep your SQL queries tucked away somewhere by default which can use up memory if you’re processing a lot of queries (in this case it was in the order of 3000 queries each seeming to use about 8K of memory)

Leave a Comment

Optimising and Profiling a CodeIgniter App

CodeIgniter has a handy profiling class which automatically times queries and can be used to time blocks of code. There is no need to live with the pain of a slow app when often you can get improvements in the order of 100 times faster just by adding an index.

I have a hacked together CodeIgniter (aka CI) based web app here at work that I use to analyse some logfiles. The basic workflow of this app is to grep through the log files looking for certain events and create database rows for these events that I can then run queries against and produce a graphical representation of what’s going on (I use SVG for the graphics but I’m not going to talk about that in this post).

Since I wrote this app, I’ve had abysmally bad performance which has gradually worsened as I’ve added more log data to the system. Today it got bad enough that I decided to bite the bullet and sort the mess out.

My first attempt to locate the issue was to place log comments with time stamps at various points in the code. By looking at the timestamps on these messages in the CodeIgniter log, I was able to locate the problem methods in my app. It turned out I had two methods that were causing the issues. I picked the first of these methods and put some debug messages in the inner for loop to see if I could break things down further but unfortunately, the granularity of the log timestamps is only one second so I couldn’t get any further using this method unless I started rolling my own timing methods.

Rather than rolling my own at this point, I decided to investigate the CodeIgnitor profiling options, assuming it had some built in help. It turned out that yes indeed CI has a cool and useful profiling class built in. Not only does it automatically print the timings of your SQL queries, and overall page load time, it lets you mark points in the code and print out the execution times of sections of code (these are called benchmark points in CI). I broke my problem loop into three timed sections and enabled the profiling as per the CodeIgniter User Manual

At this point I saw that two of the sections in my loop had ~300ms execution time. I also noticed in the SQL query section that the queries launched by these methods also had ~300ms execution time so I knew that these queries were the problem as they were being called inside a for loop so the 300ms was quickly multiplying out to tens of seconds in my total page load time.

The query in question looked like this:


SELECT value FROM niotag_events
WHERE logfile_id = 14 AND uri='our://funny/url/scheme/id/thing'
AND evtime < '2011-04-29 11:02:34.345' AND resource='M123B'
ORDER BY evtime DESC LIMIT 1

 

As It turns out, I hadn’t put any indexes on my tables so it made sense that the queries were running slow. I’m not a database optimisation expert so I just decided to add an index that covered the columns in my where clause:


CREATE INDEX niotag_events_idx_1
  ON niotag_events
  USING btree
  (logfile_id, resource, uri, evtime DESC);

 

Just this index reduced the query to less than a millisecond and I immediately felt the page load speed halve. I then tackled three other queries on different tables using the same method with similar improvements.

The upshot of this exercise was that I saw my page load time go from 25 seconds to about 250ms, that’s a factor of a 100 times speed increase. Think about the minutes I was spending waiting for results of my log crunching app while I could have been clicking through to the next page to find what I was looking for. Think of the amount of boredom that was causing me and even worse, the windows for distraction as I clicked over to twitter or elsewhere just waiting for pages to load. This exercise was well worth the effort and my only question I have now is why didn’t I fix this sooner?

Leave a Comment