Archive for May, 2011

Time Limiting With SQL

When writing SQL queries it’s important to keep in mind the cost of the functions and operators you’re using. For example, using

date_trunc('day', timestamp_column) = '$day'
 
is much slower than using
timestamp_column BETWEEN '$day' AND '$day' + interval '1 day'
 

It may seem obvious but sometimes it’s good to experiment, even doing stupid things just to know what the impacts are.

I was writing a query today to get log data limited to just a certain day. The table in question has a “timestamp without timezone” type column called create_time. I wanted to write a query to give me all the rows that matched a certain day given as a string e.g. ’2011-05-06’. Usually I would use the BETWEEN operator on the create_time column like this:


SELECT blah FROM log
WHERE create_time
    BETWEEN '$day' AND timestamp '$day' + interval '1 day'

 

But I’ve been doing a bit of date munging using date_trunc so the first attempt at hacking up this query I had written:


SELECT blah FROM log
WHERE date_trunc('day', create_time) = '$day'

 

While the second version looks a bit neater, it actually has very crappy performance because it involves doing a conversion before each comparison where as the first version using BETWEEN only has to do a comparison (It does a couple of conversions once at the start of processing to turn the strings into timestamps).

The time difference between these two queries is <1ms vs 700ms BTW so the first version is definitely the right choice.

The next query has got me a little more stumped though: I need to get a list of the days for which data exists. The query I’m using is:


SELECT cast(create_time AS date) AS day FROM log
GROUP BY day
ORDER BY day

 

Which takes over a second to run. I don’t currently have the option of adding an index to this table on this particular server so I just have to put up with the very bad performance of this query. Or is there some trick that will get me a faster result?

Leave a Comment

CodeIgniter Database Memory Use In Long Running Scripts

PHP and associated frameworks aren’t optimised for running long scripts so when your app needs to do a lot of data crunching, there are often memory problems. In this case, I discovered CodeIgniter was keeping all my queries tucked away in its internal structures. Apparently it does this so you can access profiling information when the script finishes.

I was working on my hacky app for analysing a log file today when I loaded in a biggish log file and hit the dreaded memory limit exceeded problem:

PHP Fatal error: Allowed memory size of 33554432 bytes exhausted

I googled and stackoverflowed for PHP memory profiling solutions and pretty much saw only two options: xdebug and the built-in memory_get_usage(). I considered using xdebug but in this case it would mean installing it on a production server and potentially breaking things or else copying some big files to the office and trying to reproduce the problem on our development server which would also be a big hassle. I was sure I could find the issue more easily than that. Using memory_get_usage in a log statement, I was able to get some clues as to what was going on but it’s hard to use this as well because while I could see where the memory was growing, I couldn’t tell if the large arrays were being garbage collected at a later stage (which I’m pretty sure they were).

The way the log file processing works is like this. I iterate through the file a line at a time and grep for markers that let me see what type of log message it is. Depending on the grep result, I call into the appropriate method for parsing the log message and storing the result into the matching array for that log message type. Once any of these arrays gets to a certain number of elements, I call into the method which creates a bulk insert to the DB and clears out the array.

As best as I could tell, the big arrays would have been marked for garbage collection at this point in my method:


$this->tagset_events = array();
/* the former contents of tagset_events should
now be free for garbage collection */


 

After a bit of reading I came across this StackOverflow post which mentioned the database library and that got me thinking about whether CodeIgniter was keeping my queries hanging around and using up all my memory.

As a first experiment, I changed my bulk insert calls to use simple_query() instead of query() and ran my script again. The results were immediately a success! The script finished running and the memory use reported by the profiler was only 3MB as opposed to 32MB. From this I am concluding that CodeIgniter does keep your SQL queries tucked away somewhere by default which can use up memory if you’re processing a lot of queries (in this case it was in the order of 3000 queries each seeming to use about 8K of memory)

Leave a Comment

Optimising and Profiling a CodeIgniter App

CodeIgniter has a handy profiling class which automatically times queries and can be used to time blocks of code. There is no need to live with the pain of a slow app when often you can get improvements in the order of 100 times faster just by adding an index.

I have a hacked together CodeIgniter (aka CI) based web app here at work that I use to analyse some logfiles. The basic workflow of this app is to grep through the log files looking for certain events and create database rows for these events that I can then run queries against and produce a graphical representation of what’s going on (I use SVG for the graphics but I’m not going to talk about that in this post).

Since I wrote this app, I’ve had abysmally bad performance which has gradually worsened as I’ve added more log data to the system. Today it got bad enough that I decided to bite the bullet and sort the mess out.

My first attempt to locate the issue was to place log comments with time stamps at various points in the code. By looking at the timestamps on these messages in the CodeIgniter log, I was able to locate the problem methods in my app. It turned out I had two methods that were causing the issues. I picked the first of these methods and put some debug messages in the inner for loop to see if I could break things down further but unfortunately, the granularity of the log timestamps is only one second so I couldn’t get any further using this method unless I started rolling my own timing methods.

Rather than rolling my own at this point, I decided to investigate the CodeIgnitor profiling options, assuming it had some built in help. It turned out that yes indeed CI has a cool and useful profiling class built in. Not only does it automatically print the timings of your SQL queries, and overall page load time, it lets you mark points in the code and print out the execution times of sections of code (these are called benchmark points in CI). I broke my problem loop into three timed sections and enabled the profiling as per the CodeIgniter User Manual

At this point I saw that two of the sections in my loop had ~300ms execution time. I also noticed in the SQL query section that the queries launched by these methods also had ~300ms execution time so I knew that these queries were the problem as they were being called inside a for loop so the 300ms was quickly multiplying out to tens of seconds in my total page load time.

The query in question looked like this:


SELECT value FROM niotag_events
WHERE logfile_id = 14 AND uri='our://funny/url/scheme/id/thing'
AND evtime < '2011-04-29 11:02:34.345' AND resource='M123B'
ORDER BY evtime DESC LIMIT 1

 

As It turns out, I hadn’t put any indexes on my tables so it made sense that the queries were running slow. I’m not a database optimisation expert so I just decided to add an index that covered the columns in my where clause:


CREATE INDEX niotag_events_idx_1
  ON niotag_events
  USING btree
  (logfile_id, resource, uri, evtime DESC);

 

Just this index reduced the query to less than a millisecond and I immediately felt the page load speed halve. I then tackled three other queries on different tables using the same method with similar improvements.

The upshot of this exercise was that I saw my page load time go from 25 seconds to about 250ms, that’s a factor of a 100 times speed increase. Think about the minutes I was spending waiting for results of my log crunching app while I could have been clicking through to the next page to find what I was looking for. Think of the amount of boredom that was causing me and even worse, the windows for distraction as I clicked over to twitter or elsewhere just waiting for pages to load. This exercise was well worth the effort and my only question I have now is why didn’t I fix this sooner?

Leave a Comment