Jan 9

Using Cacti to analyse your inbox

Category: analytics

For a few years now I’ve had a Cacti instance set up to monitor my inbox. It started ages ago when I realised I had a massive email backlog (over 9000 emails!) and I wanted to track my progress on getting back on track. To do this I turned to a Cacti install I had set up to monitor an Airport Extreme that was my network gateway.

Cacti Email Statistics Graph

Here’s what that looks like for my unread email for the last day. You can see that email slowly creeps up overnight and then around 8am I woke up and read the email. This gives you an interesting insight into when you get email and when it gets read. So let’s get this set up!

Before we get to Cacti we need to figure out a way of getting the data required. In my case I use GMail and Google provide an IMAP interface and I can use this to get a count of my email. However I need a way to get access to this data.

To gather the data from GMail, I use a simple PHP script and the PHP IMAP extension to login and review the data. The script looks like this:

<?php        
/* connect to gmail */       
$hostname = '{imap.gmail.com:993/imap/ssl}[Gmail]/Sent Mail';        
$username = 'username@gmail.com';        
$password = 'password';      
/* try to connect */         
$mbox = imap_open($hostname,$username,$password) or die('Cannot connect to Gmail: ' . imap_last_error());        
$status = imap_status($mbox, $hostname, SA_ALL);         
if ($status) {       
 echo "sent:" . $status->messages . " ";         
}        
else         
{        
 echo 'sent:0 ';         
}        
$folders = array('allmail' => '[Gmail]/All Mail', 'drafts' => '[Gmail]/Drafts', 'inbox' => 'INBOX');         
foreach ($folders as $key => $folder)        
{        
 $status = imap_status($mbox, '{imap.gmail.com:993/imap/ssl}' . $folder, SA_ALL);        
 if ($status) {      
 echo $key . ':' . $status->messages .' ' . $key . '_unread:' . $status->unseen . ' ';       
 }       
 else        
 {       
 echo $key . ':0 ' . $key . '_unread:0 ';        
 }       
}        
imap_close($mbox);

Walking through the code, we connect to GMail’s IMAP server and default to the ‘Sent Mail’ folder. We use that first to figure out how many email’s we’ve sent. From here I define an array called “folders” which maps the output name to the GMail folder name. For each of these folders I’m going to get how many email messages are there and how many are “unseen” or unread. Running this right now for my inbox looks like this:

$ php imap_cacti.php         
sent:15053 allmail:337306 allmail_unread:15911 drafts:117 drafts_unread:0 inbox:58648 inbox_unread:0

So I’ve sent only 15k emails (could have sworn I’ve sent more!) while I have apparently 337k emails in total and of those nearly 16k are unread. I have 117 draft emails that I’ve not sent (unsurprisingly these are all “seen”) and my inbox has 58k emails with none of them unread. At the top of the post you’ll note that my inbox looks like it’s zero and it’s reflecting the inbox unread count instead of the all mail unread count. All Mail counts email that is not just in your inbox but in all labels including ones you’ve manually set (e.g. you moved it from the inbox), stuff set by filters you’ve created (e.g. skip inbox rule) or automatically filtered content like the “Promotions” category. This 16k email count is counting mostly my promotions folder which I should really get around to cleaning out. Right now we’ve got a script that can connect to GMail and get our folder stats. Awesome! The next step is to hook this up to Cacti. You’ll notice that the PHP script is using a key:value style format and this is intentional for integration with Cacti.

To get started you’ll need a Cacti install. Most Linux distributions have a package available to get up and running. Cacti is one of the oldest tools for monitoring and it provides a neat infrastructure to gather metrics. It’s not going to scale to really large numbers but in my case of only needing to keep tabs of a couple of hosts it works perfectly. Once you’ve got Cacti up and running, log in to get to the admin panel.

Setting Cacti up to work without custom data source requires a couple of configuration items set up. The first step is to create a new “Data Input Method”. On the left click on “Data Input Methods” and then click “Add” in the top right corner. This will take you to a screen that prompts for “Name”, “Input Type” and “Input String”. For the name, go with something descriptive like “Email Stats”; the “Input Type” should be set to “Script/Command” and then the “Input String” should look like the following:

/usr/bin/php /home/pasamio/cacti/imap_cacti.php

Obviously you need to replace “/home/pasamio/cacti/imap_cacti.php” with the path to where your PHP file lives. You’ll also need to make sure that the user Cacti is running under (either a “cacti” user or the Apache user like “www-data”) can access the PHP file to execute it. Once you’ve put that in click the “Save” button to continue to the next step.

At this point the page should reload and have two new sections: “Input Fields” and “Output Fields”. These are where we define the fields that our script is outputting. On the far right side of “Output Fields”, click on “Add”. This has three entries in it: “Field [Output]”, “Friendly Name” and “Update RRD File”. For the “Field [Output]” this should match the key value in our PHP script which in the first example is “sent”. The next field, “Friendly Name” gives us an easier view of the data so in here let’s just put “Sent” and finally “Update RRD File” should be ticked (this should be the default). Once you’ve put all of that in, click on “Create” to add the new field. You’re going to need to repeat the process for the rest of the fields (e.g. “allmail”, “allmail_unread”, etc) and your display should end up looking like this:

Cacti Data Input Methods Screen

Ok, now that we’ve configured our data input our next step is to create a “Data Template”. Click on “Date Templates” under “Templates” on the left and then click “Add” on the top right corner. We’re going to have to give it name for the template and data source, so we’ll stick with “Email Stats” and then select “Email Stats” for the “Data Input Method”. That will bring us to the “Data Source Item” section. To get started, let’s plug in “sent” here as the “Internal Data Source Name”. I also set the “Maximum Value” to be 0 as well though I can’t remember why. Once you’re done here then click “Create”. The page should reload and now under “Data Source Item” we’ll have an “Output Field” option. Select “sent” from the drop down and click “Save”. Click on “new” on the right of the “Data Source Item” tab and create new names for each of the output fields (e.g. allmail, allmail_unread, etc). The process is click “new”, put in the “Internal Data Source Name”, update the “Maximum Value” and then match the data source name to the correct entry from “Output Field” and finally click “Save”. The page will reload and then you can start the process again by clicking “new”. I keep the internal data source name identical to the underlying storage name. Once you’re done, it should look like this:

Cacti Data Template Screenshot

At this point we’ve created our “Data Input Method” and we’ve created a “Data Template”. The next step is to create a new “Data Source” for this input. Under “Management” click on “Data Sources” and then click “Add” in the top left. It’ll take us to “Data Template Selection” and here you should select “Email Stats” and then the host I usually set to “localhost” and click “Create”. The page should reload with the path to an RRD file where it will be storing the data. Sweet! At this point Cacti should start gathering data to put into our RRD files. It’ll do this on the normal schedule which is every five minutes by default though your installation might be different. Until your RRD has a couple of points in it, it’s not going to be able to render anything so now is a good time to take a break while your Cacti instance starts gathering data.

You waited ten minutes or so right? Awesome! The next step is to create a “Graph Template” so head back to “Templates” on the left, click on “Graph Templates” and then “Add” on the top left. We’re going to give it a name and we’ll call this “Email Stats – Unread”, we’ll set the title to the same “Email Stats – Unread” and scroll down to the bottom to click “Create”. It’ll reload the page with a new “Graph Template Items” section at the top. Click on “Add” here to get to the “Graph TEmplate Items” page. For the “Data Source” drop down, select “Email Stats – (inbox_unread)”, select a colour from the next drop down (I went with a simple red or FF0000), set the “Graph Item Type” to “LINE1”, set “Text Format” to “Number of unread” and then click on “Create”.

We’re almost there! The last step is to click “New Graphs” on the top left corner, select your host as localhost and then using the create drop down select “Email Stats – Unread” and then click “Create”. The page should reload with a friendly confirmation “Created graph: Email Stats – Unread” at the top. Now we can click on “graphs” on the top and under “localhost” should be a graph titled “Graph Template: Email Stats – Unread”. Now if everything is working there should be a graph there and it should look something like this:

Cacti - Email Stats Unread Graph V1

If you don’t see a graph rendered there and you definitely waited long enough for the data to be populated (to be safe it should be at least half an hour), then you will need to head back to the “console”, click on “Graph Management” and then find the graph in the graph list (e.g. “Email Stats – Unread”) and click on “Turn On Graph Debug Mode” to see what the output of what “RRDTool Says”. From here you should be able to figure out what you need to do next.

Now right now we have a graph and for me it’s empty because I’ve read all of my email (yay!) however you’ll notice the original included some extra text in it. Let’s get that configured. Head back to “Graph Templates” and find the “Email Stats – Unread” template. Under “Graph Template Items”, select “Item # 1” and tick the “Insert Hard Return” box then click “Save”. Click on “Add” for the “Graph Template Items” again to add a new entry. Make sure the “Data Source” is still set to the “inbox_unread” item, change the “Graph Item Type” to “GPRINT”, set the “Consolidation Function” to “LAST” and change the “Text Format” to “Current:” and click “Create”. Repeat the same except change the “Consolidation Function” to “MAX” and the “Text Format” to “Max:” to get a maximum entry and then “Consolidation Function” to “MIN” and the “Text Format” to “Min:” to get a minimum entry. Now when we go back to “graphs”, our graph should have a new line at the bottom with “Current”, “Max” and “Min”:

Cacti - Email Stats Unread Graph V2

The next step is to repeat this to create Graph Templates for each of the different data sources you’re interested in. You can also create a graph template that includes two different values on the same graph if you want though I’ve not found that particularly useful for my own inbox. Finally you can edit the original PHP script to add extra labels to your list to track how each of them are going and then wire this up to your data source in Cacti.

At the top of the post I mentioned that I’d had over 9k emails, when I went digging I found an early graph showing I had over 10k unread!

Cacti Graph showing over 10k unread

You can see that I made progress rather rapidly and when I take a longer look at the data it really looks like a cliff:

Cacti Graph showing the rapid drop in my inbox

And also we can see that near Christmas I get plenty of time to clean up my inbox:

Cacti graph of the last 5 weeks of a year

One last interesting thing is that you can see when you get email and when it gets read, here’s a view of a given given day:

Cacti Graph showing a days worth of email unread

After a while you’ll hopefully have similar graphs that will tell their own story of your inbox.

No comments

No Comments

Leave a comment

%d bloggers like this: