Archive for April, 2011

Counting Multiple Statistics in One Query

When constructing reports of grouped time series data from SQL tables, you can use inner joins and sub queries to make up the results tables you need and avoid collating the result arrays in PHP.

In a previous post, I mentioned that I was not entirely happy with the way I had collated the results of a trend history view for an error reporting app I was hacking together. The problem was that I needed to have a count of error and count of total throughput grouped by time so I could show the error percentage and raw error counts changing over time. I couldn’t find a way to tell SQL to count total throughput and just the errors in one query. I was thinking about this in the shower (which is where the best ideas always form) and I realised I could use sub queries and inner joins to get the data and avoid the messy array operations that I was doing in PHP. Here’s my query now:


SELECT
            a.tdlday, a.total, b.count
            FROM  (
                SELECT date_trunc('day', create_time) AS tdlday,
                    count(*) AS total
                FROM error_log
                WHERE create_time BETWEEN NOW() - interval '14 days' AND NOW()
                GROUP BY tdlday
                ) AS a
            INNER JOIN (
                SELECT date_trunc('day', create_time) AS tdlday2,
                    count(*) AS count
                FROM error_log
                WHERE create_time BETWEEN NOW() - interval '14 days' AND NOW()
                    AND $error_criteria
                GROUP BY tdlday2
                ) AS b ON b.tdlday2=a.tdlday
            ORDER BY tdlday ASC

 

As you can see, the query is two sub queries with an inner join. The first sub query gives me the totals and the second gives me the error count. The join is on the date group. This allowed me to remove the second query from my PHP, five lines of expensive array rejiggering and the expensive lookups in the view when calculating the percentages.

Leave a Comment

Quick and dirty bar graphs using PHP and CSS

You don’t have to load in bitmap / SVG / javascript based graphing libraries to add visual queues to your reports. You can get a long way using CSS.

I’ve recently been delving into some data analysis for a customer to get trends of error codes from a field device. The customer was concerned that the error rate is increasing. These things are sometimes hard to visualise because there is plenty of variation from day to day. For the sake of keeping my blog active I thought I’d note a few things and if anyone out there comes across this, maybe you could give some thoughts in the comments.

I decided to make a quick and dirty web app to visualise the data. I chose CodeIgniter because it seems to be pretty popular and I’ve worked with it before. CodeIgniter uses an MVC approach but in this case I haven’t much use for the ‘M’ straight up so I start with a controller that will give me the error percentages for each of the error codes over the last 24 hours. Here’s how I do the query:


SELECT error_code, count(*)
          FROM error_log
          WHERE create_time BETWEEN NOW() - interval '1 day' AND NOW()
          GROUP BY error_code
          ORDER BY error_code

 

(I’m using postgres BTW and haven’t paid too much attention to portability because this is a hack)

In the view, I print a table of the error codes and the counts for the last 24 hour period. The last column of the table is a basic bar graph which helps to immediately pick out the red flags.

I write the text for the table using the rows returned by my query which is simple enough, open a table tag, the do a foreach loop over the result rows generating and tags as approriate and close the table after the loop. The percentages are being calculated against the total throughput of items which I obtained in a separate query.

To draw the last column, I did something a little special. First for the sake of scale, I iterated through the array earlier to obtain the maximum percent value (I also incidentally used this loop to get the value for the total row):


$error_total = 0;
$bar_max = 0;
foreach ($rows as $row) {
    $error_total += $row['count'];
    $bar_max = max($row['count'], $bar_max);
}

 

Inside the main foreach loop, I make use of a little helper I wrote:


<td><?php echo bar($row['count']/$bar_max); ?></td>

 

The helper looks like this:


<?php                                 
function bar($fraction)
{       
   $res = '<div class="outerbar"><div class="bar" style="width:';
   $res .= round($fraction*100);
   $res .= '%;" >&nbsp;</div></div>';
   return $res;                       
}           
function vbar($fraction)
{
   $res = '<div class="outervbar"><div class="vbarspace" style="height:';
   $res .= round((1-$fraction)*100);
   $res .= '%;" >&nbsp;</div><div class="vbar" style="height:';
   $res .= round($fraction*100);
   $res .= '%;">&nbsp;</div></div>';
   return $res;
}       
?>     

 

(Don’t forget you have to load helpers with a $this->load->helper(‘bar_helper’);)

The ‘bar’ function creates a couple of nested divs and uses the style element to set the width of the bar. In the CSS you just need to style the outer bar to have the width you want and the inner bar to be the colour you want. Like this:


.outerbar {
    width: 200px;
}
.bar {
    background-color: #f00;
}

 

Some readers may notice that my table also links to trends and that the bar_helper file also had a vbar function. The trends were slightly more difficult. They look like this:

I had to do a bit more work to get the trends right. First the query:


SELECT
            date_trunc('day', create_time) AS tdlday, count(*)
            FROM error_log
            WHERE create_time BETWEEN NOW() - interval '14 days' AND NOW()
                AND error_code = '$trend_variable'
            GROUP BY tdlday
            ORDER BY tdlday ASC

 

This query gives me an array with a column of 14 days and a column of error counts. I did a second query to get the total throughputs for the day. Things got a bit weird because on weekends we had a little bit of throughput but no errors so the two arrays had different numbers of rows. I got around this in a fairly expensive way by turning the second query result into a keyed array. This seems pretty expensive and if I knew a way to do all this in one SQL query or simpler PHP it would be worth doing:


$totals_keys = array_map(
            create_function('$v', 'return $v["tdlday"];'), $totals);
        $totals_values = array_map(
            create_function('$v', 'return $v["count"];'), $totals);
        $data['totals'] = array_combine($totals_keys, $totals_values);

 

You can see I did two array_maps to split out the columns and then combined the arrays back into a keyed array. Now in the view, to access the throughput total for a day I need to do $totals[$errors[$ii][‘tdldate’]].

Another complexity of the view is that with the data in columns but the page view in rows, I had to iterate through the array once for each row. I came up with a dodgy way to save myself writing the loop four times by using an array lambdas, once for each row:


$row_lambdas = array(
    array('Bar',
        create_function('$ii,$errors,$totals,$max_bar',
            '$pc = $errors[$ii][\'count\']
                / $totals[$errors[$ii][\'tdlday\']] * 100;
            return vbar($pc/$max_bar);'
)
    ),

    array('Dates',
        create_function('$ii,$errors,$totals,$max_bar',
            'return substr($errors[$ii][\'tdlday\'],5,5);')
    ),

    array('Percent',
        create_function('$ii,$errors,$totals,$max_bar',
            '$pc = $errors[$ii][\'count\']
                / $totals[$errors[$ii][\'tdlday\']] * 100;
            return sprintf("%0.1f", $pc);'
)
    ),
    array('Throughput Totals',
        create_function('$ii,$errors,$totals,$max_bar',
            'return $totals[$errors[$ii][\'tdlday\']];')
    ),
    );
?>
<?php
foreach ($row_lambdas as $row_lambda)
{
    ?>
    <tr><td><?php echo $row_lambda[0];?></td>
     <?php
    for ($ii = 0; $ii < count($errors); ++$ii) {
        ?>
        <td><?php echo
           call_user_func($row_lambda[1],$ii,$errors,$totals,$max_bar); ?></td>
        <?php
    }
    ?>
</tr>
    <?php
}
?>

 

I know newer versions of PHP support closures instead of create_function but this is an oldish server.

The final trick to this little application was that when I did the vertical bar graph, the bars wanted to grow from the top of the cells down rather than growing up from the bottom. If you look at the vbar function above, you’ll see I use two inner divs to make the bars so that I fill the whole outer bar. The first bar is actually the space above the top of the bar and then the second bar is the actual coloured part of the bar.

Comments (1)

Design Patterns

I’ve been working my way through Wikipedia’s Cliff’s Notes to Design Patterns

It’s a pretty good light weight introduction to all the concepts, many of which I’m familiar with but haven’t given them a name before. In my career thus far I haven’t been as interested in the theory as just learning how to get things done and learning the technologies but I’m finding myself more and more drawn into thinking at a more theoretical level and trying to incorporate methodologies into my work. A lot of this is through participating (or trying to) on Stack Overflow where I see a lot of these terms mentioned.

Recently I’ve become aware that I’ve neglected to read some of the more important hacking theory books and while this hasn’t had an impact on my day to day work, it has limited my participation in certain areas. I guess this is a general problem when it comes to education and ignorance. The uneducated person doesn’t understand the value of education. It is only once educated that they can look back and see how they’ve grown. See also Dunning Kruger Effect

Another positive side effect of engaging with theory is that it can help reignite a passion for the art of programming. Sometimes with professional life and the concerns of day to day mundanity I forget that I actually got into programming because I love it! It’s challenging, stimulating and rewarding to make computers do things. The reality of everyday professional programming is that I’m not always pushing the envelope, breaking new ground or going boldly where no instantiated object has gone before but spending a bit of time reading about the technologies I’m using whether it be history or theory helps me feel more of a connection with my work and motivates me to enjoy it more.

Leave a Comment

Fastest ever Boot of Embedded Linux: 300 ms

Not bad: http://lwn.net/Articles/435854/ (may be behind paywall) or http://article.gmane.org/gmane.linux.embedded.celinux.devel/100


From: Constantine Shulyupin
Subject: Fastest ever Boot of Embedded Linux: 300 ms
Newsgroups: gmane.linux.embedded.celinux.devel
Date: 2011-03-28 20:40:42 GMT (3 days, 9 hours and 30 minutes ago)

Hello all,

Make Linux Software presents the fastest ever embedded Linux boot for 720 MHz ARM and NAND flash memory. Linux boot time is 300 milliseconds from boot loader to shell. The first goal of the project is to achieve a minimal boot time of a minimal but functional Linux system on common hardware. The second goal is to provide a platform for developing more
functional systems with an even minimized boot time.

Video of boot process: http://www.youtube.com/watch?v=747XLVbTgA4

Boot log with timestamps:

0.000 0.000: TI X-Loader 1.4.4ss Mar 26 2011 01:45:43
0.000 0.000: Optimised by www.MakeLinux.com
0.000 0.000: Loading
0.237 0.237: Running
0.237 0.000: CFG_LOADADDR=80008000
0.249 0.012: (int)CFG_LOADADDR=e321f0d3
0.276 0.027: Linux version 2.6.32 (const@...)
0.276 0.000: Starting application
0.296 0.020: BusyBox v1.16.2 hush – the humble shell

To learn more, please visit http://www.makelinux.com/emb/fastboot/omap

Thanks.


Constantine Shulyupin
http://www.MakeLinux.com/
Embedded Linux Systems,
Device Drivers, TI DaVinci

Leave a Comment