Mar 26

Printing out a copy of TapForms internal structure using CouchDB

Category: couchdb,tapforms

One of the other users of TapForms had a feature request for TapForms to be able to print out the structure of a TapForms document so that one could figure out where a field was being used within the document including calculation and link references. I thought that was an interesting enough problem to solve and using the CouchDB copy I figured I’d give it a shot. Behind the scenes, TapForms 5 uses Couchbase Lite to store it’s data on iOS and Mac. Couchbase Lite also comes with support for doing Peer-To-Peer synchronisation as well as IBM Cloudant and CouchDB. I’m using synchronisation with CouchDB to get access to the data whilst using TapForms as my primary data entry system. I’ve previously written about using importing receipts into TapForms from my Mac so this builds on this by building a better understanding of the underlying data structure.TapForms when connected to CouchDB and IBM Cloudant creates a new database with the name matching the internal unique identifier of the TapForms document. This is simply a UUID prefixed by “db-” that uniquely identifies the document for syncing. In my case the fuel tracking database I’m going to use has the ID “db-c1da6869786341efbd8dbe2021a8df72”. You can get from the TapForms “Database Documents” screen or the “Sync” tabs in “Preferences” for a document. To get started, we’ll create a new directory, create a new “fuel.ini” file and put these details into in an INI format:

dbName = 'db-c1da6869786341efbd8dbe2021a8df72'
dbHost = 'http://username:password@yourhost:5984'

Now we’re done that we need to figure out the forms and fields that are in the TapForms document. Since we’re using a single CouchDB document, it means that all of the metadata is stored as individual records. From the previous blog we know that records have a “rec” prefix but in this case we’re looking for the forms which have an “frm” prefix and their fields which are stored as “fld”.

Out of the box CouchDB doesn’t support any sort of filtering, instead you need to create views which are JavaScript functions that decide which documents should be included in the view. I didn’t really want to spend time setting up all of these views to get it all to work properly so I ended up using a special endpoint in CouchDB which gives me all documents in a database unsurprinsingly called “_all_docs”. This means I end up iterating through the entire database to figure out the structure but it’ll help me understand how everything comes together.

I’m going to use PHP to start processing the data but instead of making it a website, I’m going to make it a command line executable script called “build-tree.php”. I’m going to grab the INI file and then extract the config options as variables into the PHP space. We’ll then set up some basic variables that we’ll use to track the forms we have and also the table fiedls which are in a sense sub-forms. Finally we won’t pull everything in but instead we’ll use pagination so we set up some variables to handle pagination.

#!/usr/bin/php -n -d error_reporting=-1 -d display_errors=1
<?php

$configFile = "fuel.ini";
extract(parse_ini_file($configFile));

$forms = array();
$tableFields = array();

$limit = 1000;
$page = 0;
$total = 0;

Before we get too far along, we’re going to need to create a class to store the metadata about the form, I called this TFForm and let’s check that out:

class TFForm
{
    public $id;
    public $name;
    public $fields;


    public function __construct($id, $name = "", $fields = array())
    {
        $this->id = $id;
        $this->name = $name;
        $this->fields = $fields;
    }
}

It’s a pretty simple object where we store the ID of the form, the name of the form and the fields of the form. We have a simple constructor to get things going but that’s about it. Now that we’ve got the framework in place, let’s put this in a loop and start to get some data:

while ($total == 0 || $total > $limit * $page)
{
    $data = file_get_contents(sprintf('%s/%s/_all_docs?include_docs=true&limit=%s&skip=%s', $dbHost, $dbName, $limit, $page * $limit)) or die("Unable to connect to CouchDB\n");
    $response = json_decode($data);
    $total = $response->total_rows;
    $rows = $response->rows;

We’re going to have a while loop that checks to see if total is zero (just started) or if we’ve progressed beyond the total number of records ($limit * $page). Then we use file_get_contents and build a URL to talk to CouchDB. It gives us back a JSON response which we then decode into an object and extract out the total and the rows. Ideally we’d do some more error checking here beyond a simple die if the request fails but let’s move on.

    foreach ($rows as $row)
    {
        if ($row->id[0] == '-' || $row->id[0] == '_')
        {
            continue;
        }

        $prefix = substr($row->id, 0, 3);

Now w’re starting to iterate through the rows we got back and we do a simple check for two internal types. The rows prefixed with a dash are TapForms internal records for categories such as “-alerts-“, “-fakecategory-“, “-favourites-” and “-uncategorized-“. The rows prefixed with an underscore are generally CouchDB internal documents such as the design documents like “_design/main”. These aren’t generally relevant for our use case because they’re not only fixed for every TapForms database or they’re internal to CouchDB. The last thing we do here is to get the prefix of the row from the document. TapForms has a bunch of different prefixes so let’s switch through the various options. There are some that we’re going to ignore so let’s get that out of the way first:

        switch($prefix)
        {
            case 'sea': // Saved Search
            case 'pik': // Pick List
            case 'cat': // Category
            case 'itm': // Layout Item
            case 'lay': // Layout
            case 'rec': // Record
            case 'lnk': // Link
                // silently ignore these fields
                break;
            default:
                printf("Unknown prefix '%s' for record '%s'\n", $prefix, $row->id);
                break;

As you can see from the comments there are a bunch of prefixes and as a fall back we add a default to print out anything we didn’t understand. When I first started buidling this I started with the default item and then slowly added items to the list as I saw them. Ordinarily we’d put the default at the bottom of the switch statement but for the purposes of the blog I’ve switched it around. Let’s look at the handling we do for the ‘frm’ type for forms:

            case 'frm':
                // Create the form for the field if the form doesn't already exist.
                if(!isset($forms[$row->id]))
                {
                    $forms[$row->id] = new TFForm($row->id);
                }

                // If this form doesn't have a name, let's default it.
                if (!isset($row->doc->name))
                {
                    $row->doc->name = "[Untitled Form]";
                }

                $forms[$row->id]->name = $row->doc->name;
                break;

For the forms we check to see if we’ve set up this form already in the $forms array and if it doesn’t exist, we create a new entry of a TFForm object. We then attempt to validate if the name is set in this record to deal with some incomplete records and set it into the form object. In TapForms, the form object contains just the metadata of the form itself and other records store the individual fields. For our purposes all we need is the name for displaying later, the details about which fields are in the form comes from the fields themselves.

            case 'fld':
                // Table fields don't have forms but they have a tableField
                if(isset($row->doc->tableField))
                {
                    if(!isset($tableFields[$row->doc->tableField]))
                    {
                        $tableFields[$row->doc->tableField] = array();
                    }

                    $tableFields[$row->doc->tableField][$row->id] = $row->doc;
                    break;
                }

                // A field should always have a form!
                if(!isset($row->doc->form))
                {
                    printf("Ignoring record %s because it doesn't have a form set.\n", $row->id);
                    break;
                }

                // Create the form if it doesn't exist.
                if(!isset($forms[$row->doc->form]))
                {
                    $forms[$row->doc->form] = new TFForm($row->doc->form);
                }

                // Add the field to the form.
                $row->doc->id = $row->id;
                $forms[$row->doc->form]->fields[$row->id] = $row->doc;
                break;

This is a little bit more code to validate the field because it’s a slightly more complicated object. We start out by trying to validate if it’s a field within a table field. Table fields are mini-forms within a field inside another record and their metadata looks a little different to a normal field. I’m using a different array to track the table fields so that I can output them properly later. As table fields don’t have any name metadata to track (it’s implied by the parent table field), I’m just using a simple array to track the sub-fields within the table field. Once we’re dong processing this type we break the processing of the switch statement.

The next block of code tries to validate if there is a form set for the field. As I mentioned earlier, the form itself doesn’t track it’s fields rather the fields are associated to the form. However in my testing I found some records that didn’t have this set properly for what ever reason so I put in some code to detect this, print out a debug statement for later checking and ignore those records.

The third major block is similar to the code we had in the form section to create the form if it doesn’t exist. Since we can’t be guaranteed to get the form record before a dependent field record, we need to also create a TFForm object in the field code to make sure it’s present.

Finally we’re able to add the field to the form. We do a quick copy of the ID field from the row into doc and then put the raw field document into the fields array of the form.

At this point we’re at the bottom of our loop so we need to increment the page counter so we can move onto the next page:

        }
    }

    $page++;
}

Ok! So far we’ve pulled out all of the data from CouchDB and we’ve stuffed it inside a bunch of objects. Now we need to iterate through those forms and print out the fields. Let’s start out with a loop over the form objects and print out it’s name and ID:

foreach($forms as $formId => $form)
{
    printf("\n%s: (%s)\n", $form->name, $form->id);

Let’s move on to process each of the fields. Before we get started we need to sort them using the TapForms “sortOrder” field:

    // Sort the fields by the TF sortOrder
    uasort($form->fields, function($a, $b) { return $a->sortOrder > $b->sortOrder; });
    foreach($form->fields as $fieldId => $field)
    {

I’m using a PHP anonymous function to pull out the sortOrder field before we start iterating over it. Let’s keep moving:

        // Ugly hack to print out the section header (incl new line) or just a tab intent
        ($field->fieldType == 'section') ? printf("\n\t=== ") : printf("\t");
        printf("'%s' %s (%s)", $field->name, $field->fieldType, $fieldId);
        switch($field->fieldType)
        {

There’s a dirty hack here to print out a simple prefix for the section field type and then print out the field name, field type and field ID. We then switch on the field type to do further processing. In this situation there are a few field types we want to do custom processing for: “calc”, “table”, “form”, “from_form” and “section”. Let’s get started with the most complex item, the calc field:

            case 'calc':
                if (!isset($field->formula))
                {
                    printf(" - calculation field missing formula!\n");
                    continue;
                }

This is a very simple check to see if the formula is set for this field and if it isn’t set print a message then terminate the switch statement. This is necessary for calculation fields that don’t have a formula set.

                $matches = preg_match_all('/\[([^\]]*)\]/', $field->formula, $calcFields);
                if ($matches)
                {
                    printf("\n\t\tReferenced Fields: ");

Next step is to extract out the fields from the formula and if we find matches to start outputting them. This handles the situation where a formula doesn’t reference any other fields. Next step is to process the matches we find:

                    foreach(array_unique($calcFields[1]) as $calcField)
                    {
                        if (strpos($calcField, ':'))
                        {

We’re going to iterate through the fields and we’re looking for fields that have a colon in them. Fields with a colon in them a references to either another form or a table field within this form.

                            $pieces = explode(':', $calcField);
                            $linkFieldId = $pieces[2];
                            $targetFieldId = $pieces[4];
                            $linkField = $form->fields[$linkFieldId];

Next step is to split up the item and look for the link field as the first part and the target field as the second part. We then look into the form to find the original link form so that we can determine how to process the rest of the data:

                            if ($linkField->fieldType == 'table')
                            {
                                printf("\n\t\t - %s::%s (%s::%s via %s)",
                                    $linkField->name, $tableFields[$linkFieldId][$targetFieldId]->name,
                                    $form->id, $targetFieldId, $linkFieldId);
                            }

If this the original field is a table type then we need to look at the $tableFields figure out it’s metadata to print out.

                            else
                            {
                                $targetForm = $linkField->linkToForm;
                                $refForm = $forms[$targetForm];
                                printf("\n\t\t - %s::%s (%s::%s via %s)", $refForm->name, $refForm->fields[$targetFieldId]->name, $targetForm, $targetFieldId, $linkFieldId);
                            }
                        }

Otherwise this is another form so we need to find the other form from the linking field of this form and then pull out it’s details. Yikes!

                        else
                        {
                            printf("\n\t\t - %s (%s)", $form->fields[$calcField]->name, $calcField);
                        }
                    }
                }
                printf("\n");
                break;

Last but not least if this is a field in the same form, we can get it’s details from this form directly. Let’s check out the table type:

            case 'table':
                printf("\n");
                foreach($tableFields[$fieldId] as $subFieldId=>$subField)
                {
                    printf("\t\t- '%s' %s (%s)\n", $subField->name, $subField->fieldType, $subFieldId);
                }
                break;

This one is relatively straight forward, we look up the field for this table from the $tableFields object and then we iterate through each of them to print them out. The sub-fields within a table are a restricted set and are only simple types. You can’t create table fields or link fields within a table field. Last but not least we need to process the form link type:

            case 'form':
            case 'from_form':
                printf(" %s '%s' (%s)", $field->linkToFormType, $forms[$field->linkToForm]->name, $field->linkToForm);

                if(isset($field->joinFromField))
                {      
                    printf("\n\t\t\t ON %s.%s == %s.%s",
                        $forms[$field->form]->name,
                        $forms[$field->form]->fields[$field->joinFromField]->name,
                        $forms[$field->linkToForm]->name,
                        $forms[$field->linkToForm]->fields[$field->joinToField]->name);
                }

                echo "\n";
                break;

We print out the field details here for the field that we link to and if this is a JOIN type (as opposed to a 1:Many or Many:Many) then we also print out the join details in an SQL style format.

"
            case 'section':
                echo " ===";
            default:
                echo "\n";
                break;
        }
    }
}

The end of the file we handle the section header again and by default we put in a new line if there isn’t a special handling for this field type.

Let’s checkout what this looks like using the macOS Terminal app. We need to make it executable first and then we can run the program:

$ chmod +x build-tree.php
$ ./build-tree.php 

Entries: (frm-900c80d629cd41ffb9bf45f091381753)
	'Time stamp' date_time (fld-3e55053abb7d41dda065423e460befc1)
	'Location' location (fld-51db8bacdefd454e823d01d04bbcd4a0)
	'Gallons' number (fld-6661aa37ecb64729895cd12122d79a7b)
	'Cost/Gallon' number (fld-39643a2e20c740e4b178a4a8804bed28)
	'Total Paid' number (fld-c29d3fdc2566466f8dbc7a8f819cb551)
	'Calc Total Paid' calc (fld-c243871192014c958763d073966c553b)
		Referenced Fields: 
		 - Gallons (fld-6661aa37ecb64729895cd12122d79a7b)
		 - Cost/Gallon (fld-39643a2e20c740e4b178a4a8804bed28)
	'Trip Reading' number (fld-dbe3bd2f4959456fa1bf7d17fb73add8)
	'Calc MPG' calc (fld-226af22c51264504a9cebdc27693c77f)
		Referenced Fields: 
		 - Trip Reading (fld-dbe3bd2f4959456fa1bf7d17fb73add8)
		 - Gallons (fld-6661aa37ecb64729895cd12122d79a7b)
	'Odometer' number (fld-0039b34725e44a229d6a588718a8f9e3)
	'Range' number (fld-ab78424e581d47638845c94eb7a007ec)
	'Avg MPG' number (fld-682c667abd964895b97b78e196ba6281)
	'Avg MPH' number (fld-003afeb9aafc46d7a166328493960fe3)
	'Drive Time' number (fld-287134da336f41a8b4defffcc417e237)
	'Receipt' photo (fld-dc8a71d49f1e4f10836166b61e431116)
	'Notes' note (fld-9bb47ec5d9104cf4b5be308f181cfcc5)
	'Date Created' date_created (fld-3802fb867ee142128059eec0478f2349)
	'Date Modified' date_modified (fld-aa797def48cc446384a33e83929e98ea)

This is a simple example and it doesn’t handle all of the use cases but is a good example of host the calculation fields look like. Let’s pick a slightly more complicated example, my purchases database. I created a new “purchases.ini” file and put the details in for that and re-ran the program. Here’s an extract of the most complex form I have:

Purchases: (frm-efc0199a2b1543f79e722383014533b0)
	'Image 01' photo (fld-e631165b67374734a3b8f384708b5922)
	'Title' text (fld-0d0edd2552ea461e929f806a4e5552b5)
	'State' text (fld-9402f8c0d53c43b986fee4ebc3468929)

	=== 'Main' section (fld-76597ce17f924c25bbcb195df984331c) ===
	'Date Created' date_created (fld-0d049abe706b41afb680ab9a1bf99d46)
	'Date Modified' date_modified (fld-59a06347614e48e8bf547a855b781582)
	'Purchase Date' date (fld-ccbd9a8f51d34246bebfb31aa4e397dd)
	'Ship Date' date (fld-cb3a9886ac7f4ec487447801a3911a1a)
	'Received Date' date (fld-bb17d48e41c7423692ab586f6c884d05)
	'Category' text (fld-6fdd09891a8c4d73be1b24aa07d077be)
	'Order ID' text (fld-e3e66a0f2e5c4df7b9496f65355e0bcf)
	'Marketplace' text (fld-c163aba17ae64c4d93b5a53819a139dc)
	'Store Name' text (fld-3d98dc6cdcae4a88909c97c80dde7bfb)
	'Brand' text (fld-1e250019d7b249f282cc572814d3e71d)
	'Source' web_site (fld-da9d866bf3ca4d47aade04a77efd7301)
	'Item Key' text (fld-ae7379d699e9473aa2ab16a2a2f002d4)
	'Price' number (fld-08129d71ab0f4fa4a2749456281fca07)
	'Shipping Tracking Number' text (fld-6ea45a9c141343628940bfbcfa38ee90)
	'Shipping Carrier' text (fld-12644a7a4ae24ed8a7926123832d3557)

	=== 'Storage' section (fld-f99f779335f54b9cb0a4179a90bb97dd) ===
	'Bag Barcode' text (fld-32d459f0b5fb4dc4974795c484832af1)
	'Storage Box' text (fld-c08e3a9eb7784d7f8ee3a5576c0adffa)
	'Attributes JSON' text (fld-f95fdbc7b2de4b6e8c3efb46c4c5452b)

	=== 'Product Data' section (fld-5a42b3a215d947399c120078ea868672) ===
	'Notes' note (fld-bf19d52c18cb4f5198df191ef7902e1b)
	'Product Details' note (fld-f4e804c1869740a4bfd99a9adcfb3c49)
	'Attributes' table (fld-6b2e26f53e6c4f0fb7ebc14400b4f118)
		- 'Key' text (fld-1ff8c9d03e5e48beac09bdf639c0b286)
		- 'Value' text (fld-6a145374b8774cfca13fdc0c1756d00f)
		- 'Date Created' date_created (fld-ff31511cb4f54ce1b9f7ba85a8c0f43f)
	'Variant Data' table (fld-eb212e705eb34e9ea5cc4386ea7a9b1f)
		- 'Date Created' date_created (fld-482c2129901640299867923ced44ea01)
		- 'Value' text (fld-e4ce093c1c22416192eb80554272d6cd)
		- 'Key' text (fld-ecc1b1ede8414912a63ec144012fa9e9)

	=== 'Attachments and Relationships' section (fld-339f3e44f6f142ddb5ceb1df699f494d) ===
	'File Attachment' file (fld-c28b98cb3d8a43f98ee65503b30a6658)
	'Order Items' from_form (fld-4862767002cf4aadad853e78dffb2eed) manyToMany 'Order Items' (frm-7a809372942a4031ae4bdf014f69e99b)
	'Gallery' from_form (fld-41cea237bfa44135a788d09b2a390019) manyToMany 'Gallery Items' (frm-1dd95cbe70cc490c972487920317620c)
	'Purchases' form (fld-2dfc5804be564f17bcc3c66fd1080121) manyToMany 'Purchases' (frm-efc0199a2b1543f79e722383014533b0)
	'Shipments' form (fld-263993691f544158a16dd13fdf633562) toMany 'Shipments' (frm-ac823b8717fb428fa48b65b0efa4a2c3)

	=== 'Images' section (fld-642fef453ecb4d32b74d2e34993da182) ===
	'Image 02' photo (fld-28b749eeaff24492a730e427364ca683)
	'Image 03' photo (fld-1f0ae66bce7e4b63981b244e40ce4366)
	'Image 04' photo (fld-3cf1f20172104500af3070b96da2528e)
	'Image 05' photo (fld-0a8930a83d1f41b68012178ffe54d2ab)

	=== 'Linking Metadata' section (fld-fd8627a7114a431bb9199bdc2bd67ad8) ===
	'Purchase Key' calc (fld-8be9b2c2603f458f8349082237c41964)
		Referenced Fields: 
		 - Marketplace (fld-c163aba17ae64c4d93b5a53819a139dc)
		 - Store Name (fld-3d98dc6cdcae4a88909c97c80dde7bfb)
		 - Order ID (fld-e3e66a0f2e5c4df7b9496f65355e0bcf)
	'Shipment Key' calc (fld-7652273e8d02496f8f8ebc6c46d93230)
		Referenced Fields: 
		 - Shipping Carrier (fld-12644a7a4ae24ed8a7926123832d3557)
		 - Shipping Tracking Number (fld-6ea45a9c141343628940bfbcfa38ee90)
	'Modified Age' calc (fld-9e5bc8ce982c4a468c4b66cad92ecc5b)
		Referenced Fields: 
		 - Date Modified (fld-59a06347614e48e8bf547a855b781582)
	'Purchase Age' calc (fld-9675f0fb104847f0a9705da74bb8bd1a)
		Referenced Fields: 
		 - Purchase Date (fld-ccbd9a8f51d34246bebfb31aa4e397dd)
	'UUID' calc (fld-13790925689a4a189aee77b2c4d0fcb6)
	'Sample Calc' calc (fld-cf5f7b65e683481696a7e864ab39b3e5)
		Referenced Fields: 
		 - Title (fld-0d0edd2552ea461e929f806a4e5552b5)
		 - Order Items::Title (frm-7a809372942a4031ae4bdf014f69e99b::fld-39ca9564ef2347ac93f933bc9a2316ac via fld-4862767002cf4aadad853e78dffb2eed)
		 - Variant Data::Key (frm-efc0199a2b1543f79e722383014533b0::fld-ecc1b1ede8414912a63ec144012fa9e9 via fld-eb212e705eb34e9ea5cc4386ea7a9b1f)
		 - Purchases::Purchase Key (frm-efc0199a2b1543f79e722383014533b0::fld-8be9b2c2603f458f8349082237c41964 via fld-2dfc5804be564f17bcc3c66fd1080121)
	'Orders' from_form (fld-7bdce35a95dc42d596861eedf729eb73) join 'Orders' (frm-6c5adbc73d9f498c978b637c60d19561)
			 ON Purchases.Purchase Key == Orders.Purchase Key
	'Sync Toggle' check_mark (fld-b784d5a9b3bf435b93b71a20baa4d983)

You can see examples of the section headers, the various field linking types including the JOIN type and also I created a “Sample Calc” field with some different options on how you can pick out fields.

Next step from here? Refine the code, clean it up and start transforming this into much better organised code.

No comments

No Comments

Leave a comment

%d bloggers like this: