Jan 16

Tracking my podcast backlog with influxdb and grafana

Category: analytics

I have a lot of podcasts in my backlog. Over 2000 podcasts remain behind in my history and I’ve got more than enough audio to go on for years. However the problem I had was that I wasn’t sure if I was making any progress on my podcasts or if I was slowly trending backwards (which I had been for years). Before I started at LinkedIn, my commute to work was generally a 15 minute train ride from downtown San Jose straight up North First Street. With the walking time (~10 minutes to/from stations), that landed me with less than 50 minutes of audio time each day. The problem with that was of course the one podcast I was most behind on, Radio National’s Late Night Live, is 50 minutes long and at the time aired five nights a week (now it’s four nights a week, they dropped the Friday night “classic” show which were episodes from their archive). At the time I think I was delayed to around 2009, maybe even as far back to 2007. But I didn’t have the data to see that. Now with a couple of years of LinkedIn, it felt like I was making progress but show me the data!

The first step was to figure out a couple of things:

How many episodes I have for a podcast?
What is the oldest episode that I have for a podcast?

The first is important because that tells me if I’m making progress or not as it should be a solid line. The second tells me roughly what my velocity over time is for a given show. The second also helps me understand how many episodes that I’m listening to over time versus the first where that gets a little messy. However before I get started I need to install InfluxDB! To do this I used homebrew to install it per the InfluxDB installation guide and I was up and running pretty quickly.

How many episodes do I have for a podcast?

Now I’ve used iTunes for the longest time to store my podcasts and use my iPod and then later iPhone to listen to my podcasts. This means that my extensive podcast backlog lives inside my Mac’s Music directory. I also considered figuring out how to get this data from iTunes directly however I couldn’t easily figure that out and decided to look at the file system to figure it out. As with many little hacks that I put together, this started off in bash with my trying to figure out how to get my podcasts. The iTunes podcast directory is ~/Music/iTunes/iTunes Music/Podcasts, so let’s checkout what that looks like:

$ ls
Background Briefing - Program podcast
Big Ideas - Full program podcast
Catalyst
Correspondents Report - Full Program
Dr Karl on triplej
Future Tense - Full program podcast
Future Tense - full program
Late Night Live
Late Night Live - Full program podcast
Late Night Live - full program
Manager Tools
Manager Tools Basics
New Free Music
Ockham's Razor - Program podcast
Rear Vision - Podcast
Rear Vision - Program podcast
The Science Show - Full program podcast
triple j_ New Free Music
triple j_ New Music Podcast

First thing that jumps out to me is that the data is not clean! At some point some of these podcasts had no suffix, there is a “full program” suffix, a “Full program podcast” suffix, a “Podcast” suffix or a “Program Podcast”. However the most complicated of all is the “New Free Music” podcast that has three iterations: “New Free Music”, “triple j_ New Free Music” and “triple j_ New Music Podcast”. All noise! This is actually not a podcast in the sense that it’s episodes but it is music that they provided, so I don’t delete these files but keep them around. Some of these various naming changes were in fact reflected in iTunes where some of these podcasts were in fact doubled up, so even going back to the source of truth would have been complicated anyway (curiously as I caught up on the various podcasts iTunes fixed itself). Now in each of these directories is a set of MP3 or MP4 files (Catalyst is a vodcast of the now defunct ABC TV science show). Let’s check out the Late Night Live directory:

$ ls Late\ Night\ Live
Late Night Live - 2010-11-05.mp3
Late Night Live - 2010-11-16.mp3
Late Night Live - 2010-11-17.mp3
Late Night Live - 2010-11-22.mp3
Late Night Live - 2010-11-23.mp3

As we can see there are a bunch of episodes in the directory so my aim is to count them. Of course the problem is that these are spread out across multiple directories, so we need to clean that up as well. What I ended up coming up with looked like this:

for i in *; do 
    echo "\"$(echo $i | cut -f1 -d- | sed -e's/ *$//')\"",$(ls "$i" | wc -l)
done | 
    awk -F, '{a[$1]+=($2)}END{OFS=",";for(x in a)print x, a[x]}' | 
    grep -v 'triple j' | 
    grep -v 'New Free' | 
    sort -t, -r -n -k2

I’ve expanded this out because otherwise it’d be a case of Holy BASH one liner bat man! Let’s start slowly taking this a part. The easy part is the for loop, the “for i in *” which takes all entries in the current directory (e.g. ~/Music/iTunes/iTunes Music/Podcasts). This takes each of the folders in the directory and iterates through them. I have a fun double echo that trims out the the dashes in the name (cut -f1 -d-) which is then piped to sed to trim out any extra spaces at the end of the line. The second half does an ls on the directory and simply counts the lines returned. Let’s see what that looks like when run alone:

$ for i in *; do
>     echo "\"$(echo $i | cut -f1 -d- | sed -e's/ *$//')\"",$(ls "$i" | wc -l)
> done
"Background Briefing", 49
"Big Ideas", 787
"Catalyst", 180
"Correspondents Report", 11
"Dr Karl on triplej", 137
"Future Tense", 183
"Future Tense", 1
"Late Night Live", 5
"Late Night Live", 1055
"Late Night Live", 14
"Manager Tools", 453
"Manager Tools Basics", 22
"New Free Music", 82
"Ockham's Razor", 56
"Rear Vision", 19
"Rear Vision", 6
"The Science Show", 124
"triple j_ New Free Music", 260
"triple j_ New Music Podcast", 9

So it outputs a list of the folders and the count of how many entries are in each. You’ll notice that this is also cleaned up but resulted in duplicate entries for items like Late Night Live and Future Tense (Manager Tools and Manager Tools Basics are in fact distinct podcasts; I also keep all of the basics around to re-listen to from time to time). This brings us to the awk line. awk is a language for processing text files and in this case we’re piping this list in to it and using awk to aggregate the values out for each of the folders. With awk, “-F,” tells it that the comma is a field separator. The first part of the code, “{a[$1]+=($2)]” says to use the first field (in our case, podcast name) and then add to it the value of the second field (the += operator). The “END” marks the end of the script and the next part will run once awk has finished processing the file. The ‘OFS=“,”’ tells awk to use the comma as the Output Field Separator and then the for loop iterates through the variable ‘a’ and prints their contents. Let’s check out what that looks like:

$ for i in *; do
>     echo "\"$(echo $i | cut -f1 -d- | sed -e's/ *$//')\"",$(ls "$i" | wc -l)
> done |
>     awk -F, '{a[$1]+=($2)}END{OFS=",";for(x in a)print x, a[x]}'
"Correspondents Report",11
"triple j_ New Free Music",260
"Late Night Live",1074
"Background Briefing",49
"Manager Tools",453
"Rear Vision",25
"Dr Karl on triplej",137
"New Free Music",82
"Future Tense",184
"Manager Tools Basics",22
"triple j_ New Music Podcast",9
"Catalyst",180
"The Science Show",124
"Ockham's Razor",56
"Big Ideas",787

The next couple of lines use grep to exclude (-v) the two entries for the New Free Music podcast and finally the sort line reorders the output to be from most podcasts to least podcasts. The “-t,” option sets the field separator to the comma again, “-r” does a reverse sort and “-n” sorts numerically while finally the “-k2” option tells sort to use the second field which in our case is the number of episodes:

$ for i in *; do
>     echo "\"$(echo $i | cut -f1 -d- | sed -e's/ *$//')\"",$(ls "$i" | wc -l)
> done |
>     awk -F, '{a[$1]+=($2)}END{OFS=",";for(x in a)print x, a[x]}' |
>     grep -v 'triple j' |
>     grep -v 'New Free' |
>     sort -t, -r -n -k2
"Late Night Live",1074
"Big Ideas",787
"Manager Tools",453
"Future Tense",184
"Catalyst",180
"Dr Karl on triplej",137
"The Science Show",124
"Ockham's Razor",56
"Background Briefing",49
"Rear Vision",25
"Manager Tools Basics",22
"Correspondents Report",11

We can see here that we’ve got a list of podcasts in a much cleaner format and counting all of the different values properly. Great! Now that we have this we can push this into influxdb. I ended saving this bash one liner into a shell script that also includes a “cd ‘~//Music/iTunes/iTunes Music/Podcasts’” in front and called it “podcast-size.sh”. I then put this into another one liner which then pushed it into influxdb:

~/bin/podcast-size.sh  | 
     sed -e's/ /\\ /g' | 
     sed -e"s/'//g" | 
     sed -e's/^\([^,]*\),\(.*\)/insert size,podcast=\1 value=\2/' | 
     /usr/local/bin/influx --database podcasts

Let’s take this apart step by step again. The output of “~/bin/podcast-size.sh” is the same as our one liner we took apart above. We then feed that into a sed that escapes spaces:

$ ~/bin/podcast-size.sh  | sed -e's/ /\\ /g'
"Late\ Night\ Live",1074
"Big\ Ideas",787
"Manager\ Tools",453
"Future\ Tense",184
"Catalyst",180
"Dr\ Karl\ on\ triplej",137
"The\ Science\ Show",124
"Ockham's\ Razor",56
"Background\ Briefing",49
"Rear\ Vision",25
"Manager\ Tools\ Basics",22
"Correspondents\ Report",11

We can see that the spaces are now all escaped. The next sed takes the single quotes and removes them (e.g. Ockham’s Razor):

$ ~/bin/podcast-size.sh  | sed -e's/ /\\ /g' | sed -e"s/'//g"
"Late\ Night\ Live",1074
"Big\ Ideas",787
"Manager\ Tools",453
"Future\ Tense",184
"Catalyst",180
"Dr\ Karl\ on\ triplej",137
"The\ Science\ Show",124
"Ockhams\ Razor",56
"Background\ Briefing",49
"Rear\ Vision",25
"Manager\ Tools\ Basics",22
"Correspondents\ Report",11

The next sed statement takes this output and then turns it into a query for influxdb:

$ ~/bin/podcast-size.sh  | sed -e's/ /\\ /g' | sed -e"s/'//g" | 
>     sed -e's/^\([^,]*\),\(.*\)/insert size,podcast=\1 value=\2/'
insert size,podcast="Late\ Night\ Live" value=1074
insert size,podcast="Big\ Ideas" value=787
insert size,podcast="Manager\ Tools" value=453
insert size,podcast="Future\ Tense" value=184
insert size,podcast="Catalyst" value=180
insert size,podcast="Dr\ Karl\ on\ triplej" value=137
insert size,podcast="The\ Science\ Show" value=124
insert size,podcast="Ockhams\ Razor" value=56
insert size,podcast="Background\ Briefing" value=49
insert size,podcast="Rear\ Vision" value=25
insert size,podcast="Manager\ Tools\ Basics" value=22
insert size,podcast="Correspondents\ Report" value=11

At this point we have something that will insert into influxdb and the pipe ends with the influx command which will then run those commands to insert the data into the influxdb database. This solves our need for counting just how many episodes we have, how do we figure out the age?

What is the oldest episode I have for a podcast?

Now that we’ve got the size sorted, the next problem is what is the oldest episode so I can chart how I’m going over time. Let’s look at our data again to see what we can do. Looking at the Late Night Live folder from above, the date of the episode is stored in the file name so maybe we can parse the file name? Let’s take a look at another directory:

$ ls Late\ Night\ Live\ -\ Full\ program\ podcast/ | head -n 10
06_12_2011.mp3
09_12_2011.mp3
11th August  2016.mp3
12th November 2015.mp3
13 October, 2015.mp3
14 October, 2015.mp3
14_12_2011.mp3
14th April 2016.mp3
14th June 2016.mp3
15 December, 2015.mp3

Oh my, ok that’s a couple of different formats. Let’s see what it looks like at the bottom:

$ ls Late\ Night\ Live\ -\ Full\ program\ podcast/ | tail -n 10
Why we need not be afraid of AI; Kate Tempest poet now novelist.mp3
Winners and losers in Syria; Tap and gone - outsourcing Medicare; David Bradbury filmmaker.mp3
Winners of USB Stick_ Australian lives during WW1; Being Roma is not a crime..mp3
Yanis Varoufakis;Anne Summers;Peter W Singer.mp3
Yasmin Alibhai-Brown's 'Exotic England'.mp3
Yasmin Alibhai-Brown's 'exotic England' 1.mp3
Youth mental illness, Life of Gore Vidal; the lucky Irish.mp3
press freedom in Timor-Leste; Election Watch_ South Australia; the future of jobs.mp3
the end of the homosexual_ Indigenous memorials.mp3
the late Christopher Hitchens on his book, God is Not Great.mp3

Ok, that doesn’t have any time stamp in it at all. Let’s work on Plan B! Each file has a timestamp in it so let’s use that. And that means it’s time for another bash one liner:

for i in *; do 
    echo "\"$(echo $i | cut -f1 -d- | 
        sed -e's/ *$//')\"",$(date -j -f "%Y-%m-%d %H:%M:%S" "$(ls -ltT "$i" | tail -n1 | 
        awk '{printf "%d-%02d-%02d %s", $9, (match("JanFebMarAprMayJunJulAugSepOctNovDec",$6)+2)/3, $7,$8}')" "+%s")
done | 
    grep -v 'triple j' | 
    grep -v 'New Free' | 
    sort -k1,2 -t, -r | 
    awk -F, '{data[$1] = $2}END{for (datum in data) printf "%s,%s\n", datum, data[datum]}'

You’ll notice that the structure is similar. We have the same for loop and we also get rid of the New Free Music podcast via a grep as well. But even unpacked that’s one really long echo statement, though even it is doing a similar thing to the earlier one: in the first $() section we clean the podcast name and then in the second one we have a whole heap more magic. Let’s take a closer look at that:

date -j -f "%Y-%m-%d %H:%M:%S" "$(ls -ltT "$i" | 
    tail -n1 | 
    awk '{printf "%d-%02d-%02d %s", $9, (match("JanFebMarAprMayJunJulAugSepOctNovDec",$6)+2)/3, $7,$8}')" "+%s"

That’s still a really long line, so let’s keep unpacking as there is another $() section on the inside:

ls -ltT "$i" | tail -n1 | 
    awk '{printf "%d-%02d-%02d %s", $9, (match("JanFebMarAprMayJunJulAugSepOctNovDec",$6)+2)/3, $7,$8}'

Ok, we’re at the simplest expression, let’s set “$i” to be “i=Late\ Night\ Live\ -\ Full\ program\ podcast/“ and see what this ls looks like (trimmed):

$ i=Late\ Night\ Live\ -\ Full\ program\ podcast/
$ ls -ltT "$i" | head -n10
total 59093880
-rw-r--r--@ 1 pasamio  staff  51294661 Dec 29 15:43:53 2016 John Olsen on his life and work..mp3
-rw-r--r--@ 1 pasamio  staff  51294898 Dec 28 18:32:53 2016 Helen Garner on love, life and loss. Author Margo Jefferson on growing up in 'Negroland'..mp3
-rw-r--r--@ 1 pasamio  staff  51294900 Dec 27 18:32:48 2016 Matti Friedman on the conflict that never ends in the Middle East; are robots coming for us_.mp3
-rw-r--r--@ 1 pasamio  staff  51294735 Dec 26 18:33:20 2016 The story behind Shakespeare's most powerful play..mp3
-rw-r--r--@ 1 pasamio  staff  51295714 Dec 22 18:32:56 2016 Randolph Stow, author and enigma, Inside America's notorious prisons.mp3
-rw-r--r--@ 1 pasamio  staff  51294647 Dec 21 18:32:58 2016 Contesting Henry VIII's will; modern day slavery.mp3
-rw-r--r--@ 1 pasamio  staff  51294654 Dec 20 18:33:03 2016 Donna Leon on her Venetian detective stories; Ian Buruma's memoir of his grandparents between the wars..mp3
-rw-r--r--@ 1 pasamio  staff  51294826 Dec 19 18:33:04 2016 The 'emotional toxicity' of neo-liberalism; memoirs of a Syrian architect.mp3
-rw-r--r--@ 1 pasamio  staff  51293799 Dec 15 18:33:15 2016 Stan Grant profile.mp3

So this is outputting the list of the episodes with their full details (-l), is sorted by time (-t) and finally includes the full time stamp (-T). When we add the tail -n1, we get just the last line:

$ ls -ltT "$i" | tail -n1
-rw-r--r--@ 1 pasamio  staff  24486738 Nov 28 09:19:14 2011 Tuesday 20 December 2011.mp3

This then brings us to the the awk statement which in this case is taking the date and reformatting it into something structured slightly differently. There is a bit of magic to take the textual month name (e.g. “Dec” or “Nov”) and turn that into a number. We also remove out all of the other fields and just stick to the date fields:

$ ls -ltT "$i" | tail -n1 | awk '{printf "%d-%02d-%02d %s", $9, (match("JanFebMarAprMayJunJulAugSepOctNovDec",$6)+2)/3, $7,$8}'
2011-11-28 09:19:14

Ok, this looks like a timestamp format we can do something useful with! Let’s plug that back into our date command:

$ date -j -f "%Y-%m-%d %H:%M:%S" "$(ls -ltT "$i" | tail -n1 | awk '{printf "%d-%02d-%02d %s", $9, (match("JanFebMarAprMayJunJulAugSepOctNovDec",$6)+2)/3, $7,$8}')" "+%s"
1322500754

So what’s happening here is “-j” tells date to not try to set the date, ‘-f “%Y-%m-%d %H:%M:%S”’ specifies an input format that matches the string we just created from awk (Year-Month-Day Hours:Minutes:Seconds), we then have the one liner to figure out the last podcast for a directory and finally “+%s” tells date to output the converted time as a UNIX timestamp which we see below. When we take this out to our for loop we can see that we get the output per folder again:

$ for i in *; do
>     echo "\"$(echo $i | cut -f1 -d- | sed -e's/ *$//')\"",$(date -j -f "%Y-%m-%d %H:%M:%S" "$(ls -ltT "$i" | tail -n1 | 
>     awk '{printf "%d-%02d-%02d %s", $9, (match("JanFebMarAprMayJunJulAugSepOctNovDec",$6)+2)/3, $7,$8}')" "+%s")
> done
"Background Briefing",1391587601
"Big Ideas",1363224296
"Catalyst",1274612314
"Correspondents Report",1443281684
"Dr Karl on triplej",1391587599
"Future Tense",1341642574
"Future Tense",1442184087
"Late Night Live",1291342481
"Late Night Live",1322500754
"Late Night Live",1302780473
"Manager Tools",1454738366
"Manager Tools Basics",1463358681
"New Free Music",1350443540
"Ockham's Razor",1391587578
"Rear Vision",1472424144
"Rear Vision",1433654801
"The Science Show",1412456585
"triple j_ New Free Music",1274614610
"triple j_ New Music Podcast",1274614646

The two grep’s remove the folders we don’t care about and that brings us to the sort. Now the last sort we used looked like this:

sort -t, -r -n -k2

And this sort looks like this:

sort -k1,2 -t, -r

Now it’s slightly reordered but it looks similar. We have “-t,” and “-r” but this time we’re not sorting numerically but instead sorting by the first key and then the second key. What does this end up looking like?

$ for i in *; do
>     echo "\"$(echo $i | cut -f1 -d- | sed -e's/ *$//')\"",$(date -j -f "%Y-%m-%d %H:%M:%S" "$(ls -ltT "$i" | tail -n1 | 
> awk '{printf "%d-%02d-%02d %s", $9, (match("JanFebMarAprMayJunJulAugSepOctNovDec",$6)+2)/3, $7,$8}')" "+%s")
> done |
>     grep -v 'triple j' |
>     grep -v 'New Free' |
>     sort -k1,2 -t, -r
"The Science Show",1412456585
"Rear Vision",1472424144
"Rear Vision",1433654801
"Ockham's Razor",1391587578
"Manager Tools",1454738366
"Manager Tools Basics",1463358681
"Late Night Live",1322500754
"Late Night Live",1302780473
"Late Night Live",1291342481
"Future Tense",1442184087
"Future Tense",1341642574
"Dr Karl on triplej",1391587599
"Correspondents Report",1443281684
"Catalyst",1274612314
"Big Ideas",1363224296
"Background Briefing",1391587601

In this case we’re seeing that the output is sorted in reverse by the name with “The Science Show” on top and “Background Briefing” at the bottom (reverse alphabetical) and if we look at “Late Night Live” we can see that it’s then sorted by the time in reverse order. The last step is to plug the awk statement in which is actually just taking the last entry in the array this time (“data[$1] = $2”) and then outputting it similar to what we did last time (except this time I use printf instead):

$ for i in *; do
>     echo "\"$(echo $i | cut -f1 -d- | 
>     sed -e's/ *$//')\"",$(date -j -f "%Y-%m-%d %H:%M:%S" "$(ls -ltT "$i" | tail -n1 | 
>         awk '{printf "%d-%02d-%02d %s", $9, (match("JanFebMarAprMayJunJulAugSepOctNovDec",$6)+2)/3, $7,$8}')" "+%s")
> done |
>     grep -v 'triple j' |
>     grep -v 'New Free' |
>     sort -k1,2 -t, -r |
>     awk -F, '{data[$1] = $2}END{for (datum in data) printf "%s,%s\n", datum, data[datum]}'
"Correspondents Report",1443281684
"Background Briefing",1391587601
"Late Night Live",1291342481
"Manager Tools",1454738366
"Dr Karl on triplej",1391587599
"Rear Vision",1433654801
"Future Tense",1341642574
"Manager Tools Basics",1463358681
"Catalyst",1274612314
"Big Ideas",1363224296
"Ockham's Razor",1391587578
"The Science Show",1412456585

At this point we have output that looks very similar to the podcast-size.sh file we had in the last example and I saved this as a file called podcast-age.sh with two extra additions, a “cd ~/Music/iTunes/iTunes Music/Podcasts” and I also set “LANG=C” to ensure that the ls output is stable. This then uses the same influx style command as before except it’s inserting into “age” instead of “size”:

~/bin/podcast-age.sh | sed -e's/ /\\ /g' | sed -e"s/'//g" | 
    sed -e's/^\([^,]*\),\(.*\)/insert age,podcast=\1 value=\2/' | 
    /usr/local/bin/influx --database podcasts

At this point I put these two lines into a file and then used crontab to run it daily:

@daily /Users/pasamio/bin/podcast-tracker.sh > /tmp/podcast-tracker.log 2>&1

Now every day I get a new entry into influxdb automatically. The next step is to visualise it!

Using Grafana to visualise the data

At this point I decided to use Grafana to visualise the data. Chronograf is the usual part of the Influx “TICK” stack however Grafana also supports connecting to an InfluxDB instance. I used Docker to set up a Grafana instance on my Mac (docker run -i -p 3000:3000 grafana/grafana) and waited for it to start up. Once it was up and running I connected to it, headed to the Grafana menu on the top left and selected “Data Sources” to add a new data source. I named it “influxdb”, set it as the default and changed the type to “InfluxDB”. I filled in the URL to the instance using my internal static IP and set the database to “podcasts”. Clicking “Add” the page reloads and tells me it successfully connected. Awesome!

Screenshot of Grafana's Data Source page set up for influxdb

Next step is to create a new dashboard, under the Grafana menu, expand out “Dashboards” and click on “New”. Click on the “Graph” option that appears to get started with a new graph and it should pop in saying “No datapoints”. Click on the text “Panel Title” which should display a menu with an “Edit” option. We need to change the default query that’s displayed at the bottom so click on the query to expand out the query builder. Click on the box labelled “select measurement” and I’m going to change it to “size” so that I can see the number over time. At this point I get some data but it’s not really useful because it’s the mean of a set of values that don’t make sense. Curiously it looks like it’s trending the wrong way though after going the right way for a while:

Grafana dashboard showing mean size

That certainly seems interesting however as it’s in aggregate a bit hard to dig into, let’s try changing the “mean()” to be “sum()” to see what we get:

Grafana dashboard showing sum of podcasts

Ok now that’s even more confusing, what’s going on in March? Let’s remove the “sum()”, clean out the group by and set it to “tag(podcast)”:

Grafana dashboard displaying number of episodes grouped by podcast

Now this looks mildly useful albeit the graph itself is horridly presented! I can see two trends: the top most line (which is Late Night Live) is slowly trending downwards. A great sign! However the next largest line, which is Big Ideas, is trending upwards slowly as well. Let’s go to “Display” and toggle “Fill” to zero to get rid of that annoying fill:

Screenshot of the Grafana display settings

A little bit more reasonable but it’s still a little hard to see what’s going on for each podcast. While I’m clearly making progress on the Late Night Live podcast and Big Idea’s is slowly creeping up, what about some of the smaller podcasts? Let’s use Grafana to select just “Correspondents Report”, “Ockham’s Razor” and “Rear Vision” to see what’s going on there:

Screenshot of Grafana dashboard with just a few podcasts selected

This show’s me something I knew which is that I’m also getting through a bunch of other podcasts and I’m actually up to date with some of them! Or at least I was at the end of July anyway. Now that we’ve drilled into the podcasts, let’s go back and figure out how to get our sum on properly. Turns out this is actually reasonably simple, we just need to change the group by to use “time(1h)” and we get a much more reasonable looking graph:

Screenshot of Grafana with the sum of the podcast sizes

Now this rather depressing looking graph does tell me one thing: I’m going backwards and not making progress overall. While I’m making progress on one podcast (Late Night Live) that is significantly far behind, overall I’m adding more podcasts than I’m listening to. Now if we flip our chart over to age, we actually want this graph to slowly go up and to the right (since it’s a unix timestamp) which shows me something else interesting:

Screenshoft of Grafana displaying the podcast age

At the top you’ll notice that there is a brand new green line, that’s actually when I started listening to a new podcast! It also shows at the bottom that I was making good progress on Late Night Live though I stalled for a while and instead I listened to The Science Show. Now some of this is skewed because my input data source is actually the time on the file and it turns out along the way the times got clobbered in a backup/restore and it’s actually stable even though I’m making progress on listening to podcasts. Looks like I might have a few other issues as other podcasts aren’t making progress even though presumably I caught up with them. The power of visualisation!

Now it looks like I have another mystery to solve: what’s going wrong with my podcast ages?

No comments

Tracking my podcast backlog with influxdb and grafana

How many episodes do I have for a podcast?

What is the oldest episode I have for a podcast?

Using Grafana to visualise the data

Like this:

Related

No Comments

Leave a comment

My Academic Publications

Tracking my podcast backlog with influxdb and grafana

How many episodes do I have for a podcast?

What is the oldest episode I have for a podcast?

Using Grafana to visualise the data

Share this:

Like this:

Related

No Comments

Leave a comment

My Academic Publications