Jan 30

Using MySQL as a time series database to track my podcasts

Category: analytics

In my previous blog post I covered how I used InfluxDB and Grafana to do analytics on my podcast backlog reaching the depressing conclusion that I wasn’t in fact making progress on my backlog. Oh my. If you look carefully though you’ll also note that the time series used ranged from the 15th February 2016 to the 22nd September 2016. One might have also noticed that in the graph close ups there were these unusually straight lines which whilst not visible to you were actually gaps in the data where Grafana was connecting the dots. There were actually a couple of these points and they were periods where InfluxDB for what ever reason hadn’t started. The final reason it ends in September was that InfluxDB at the time was refusing to start and until I got around to writing the blog post had been dead. Instead in late November 2016 I decided to try something different: to use MySQL as a time series database instead!

The idea here is actually really simple, create a schema with three fields: timestamp, key and value. Now InfluxDB might support more than this, it’s flakiness at times was annoying. Doubly so because when it was refusing to start I decided instead to try to upgrade it only to get another error: I couldn’t directly upgrade from 0.10 to 1.0 of InfluxDB, I’d have to get the 0.10 instance back up and running, export the data out and then reload it into 1.0. This is probably much easier to do if your installation was working prior to upgrading but that wasn’t me. That leaves me with a rather simple schema though.

Here’s what the podcast size table looks like under MySQL:

CREATE TABLE `podcast_size` (
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`key` varchar(150) NOT NULL DEFAULT '',
`value` int(11) DEFAULT NULL,
PRIMARY KEY (`timestamp`,`key`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Nice and simple and easy, let’s checkout the podcast age table:

CREATE TABLE `podcast_age` (
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`key` varchar(150) NOT NULL DEFAULT '',
`value` bigint(11) DEFAULT NULL,
PRIMARY KEY (`timestamp`,`key`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Actually it’s identical except for the name. Well that should make it easy to handle. The next task is to start getting data into the system. I’d already built the code for the InfluxDB and it appears to be mostly working (well size works at least, age needs some more work), I figured I might as well re-use it. With InfluxDB, I was using the influx CLI tool to quickly push the data into the system however the format for MySQL is a little different. This time I decided to make a very simple PHP wrapper to suck the data in:

<?php
// Connect to the DB first as this is the cheapest operation.
$conn = mysqli_connect('localhost', 'username', 'password', 'analytics') or die('Failed to connect to database');

// Figure out the age and size of the podcasts.
exec('~/bin/podcast-age.sh', $podcastAge) or die('Unable to exec podcast-age command');
exec('~/bin/podcast-size.sh', $podcastSize) or die('Unable to exec podcast-size command');

// Pump them into MySQL :D
insertData($conn, 'podcast_age', $podcastAge);
insertData($conn, 'podcast_size', $podcastSize);

// And we're done!
mysqli_close($conn);

// Utility function to insert data.
function insertData($conn, $table, $data)
{
    foreach ($data as $datum)
    {
        $query = sprintf(‘INSERT INTO `%s` (`key`,`value`)VALUES(%s)', $table, $datum);

        $result = $conn->query($query);

        if ($result)
        {
            // Success!
        }
        else
        {
            printf("Failed to execute query: %s\n", $query);
            var_dump($conn->error_list);
        }
    }
}

This is actually quite simple, I connect to the database, run the podcast-age.sh and podcast-size.sh scripts and put their output into a row and then splice that into an INSERT statement similar to what happened with InfluxDB. The difference here is that it’s using PHP and it has a few extra lines to do it. Let’s be clear that this is far from production usage and would need a lot more work to make sure that malicious SQL didn’t run (don’t run this on anything from the web!) but it quickly does what I need to start sucking in data.

Similar to the InfluxDB version, I also use cron to run this daily:

@daily php ~/bin/podcast-db.php > /tmp/podcast-db.log    2>&1
This then grabs my podcasts daily and puts them into MySQL for me. Of course the fun thing with this is waiting for a while so that we can do something useful with the data!
No comments

No Comments

Leave a comment

%d bloggers like this: