Dec 4

Adding a new dataset to ePrints 3.2

Category: eprints

For the last week or so I have been digging around ePrints 3.2 and been trying to work out how to add a dataset. I’d done this previously with the author ID project however ePrints 3.2 advertises that is has user defined datasets. It has to be said that adding a new dataset is easier said than documented – not to mention done. But I figured I’d document how I managed to get it to work.

To begin with I’m using a clean ePrints 3.2 install with a new repository. I’ve not done much configuration to it because I’m basically looking at how to add a dataset. I’ve installed this on a Debian 5.0 box and installed ePrints via the Debian package. It has to be the easiest way of doing things IMHO so I definitely recommend it. Debian is my preferred distribution so there is an allegiance there.

Once you’ve got ePrints set up with a new repository, check out the /archives/repositoryname/cfg/cfg.d directory and look for a “datasets.pl” file. This is the first step in the process and documents two samples: one using a set of fields and a second with a class that you can derive from. In practice neither example works properly which is a bit disappointing. Fortunately the second solution with the class works with one minor modification: you need to implement the “get_dataset_id” function for it to work properly. This is relatively trivial to complete and you’re off on your way. A copy of the datasets.pl is included in a ZIP package linked at the end of this blog post.

Once we’ve got our datasets we’re going to need to update the repository structure to add them to the database. ePrints makes this relatively trivial with the following command:

/path/to/eprints3/bin/epadmin update_database_structure repositoryname

That will automatically connect to the relevant database, create tables and add fields where necessary. Very funky.

The next stop on the list is to create a workflow for it. If you’re not familiar with how ePrints workflows behave then check out the ePrints wiki on the Workflow Format for more details. The basics are that it describes how fields should work on a screen and some limited scripting capability. To begin with we need to head to our /archives/repositoryname/cfg/workflows directory and create a new one. My dataset was called “service” so I created a new “service” directory. Because I’m lazy, I copied the user/default.xml file into service/default.xml and edited it to meet my needs. Mostly deleted the items and added the fields that I cared about. Again, if in doubt check out the documentation on workflows.

Now ePrints also has citation support and you guessed it we need to provide a citation. The Citation Format is documented in the wiki as well though a bit simpler than the citation format. To create this we jump to /archives/repositoryname/cfg/citations, create a new directory for the dataset (“service” again) and then because I’m lazy copy the user/default.xml file to service/default.xml and customise to what you need. The format is again pretty simple so follow your nose from what is on offer and again if in doubt check out the wiki.

At this point if you restart your ePrints instance to ensure the configuration is fully reloaded you need to grant your users some roles before you can see the dataset. If you jump into your favourite user and administer their account you will need to add some roles. There are a few different actions that appear to make sense:

  • view – required to see the dataset
  • create – add a new item to the dataset
  • edit – edit existing items in the dataset
  • destroy – remove an item from the dataset
  • details – display a details screen for an item in the dataset

Each of these can be turned into a role by adding “+datasetname/action” into the roles box in the administrate user screen. In my particular case I did “+service/view”, “+service/create”, etc for those roles. Once you give yourself all of these roles you will be able to see under the “Manage Records” option (to the right of “Manage deposits”).

At this point you’re looking at it and it looks ugly because the strings aren’t translated properly. ePrints makes this easy, just click on the “Edit page phrases” button. It’ll list the phrases displayed on the screen and give you the option to add new phrases for the ones you’re missing whilst highlighting it. Fill in the blanks as you go and once you’ve added phrases for everything you’re ready to deploy out your data set!

So whilst we’re done and we can edit it, we probably want to grant it to our users to have a look at these as well. Adding these roles individually to our users isn’t fun, however if we edit the /archives/repositoryname/cfg/cfg.d/user_roles.pl file and add the roles to the existing user role we gain what we need. In addition to the existing roles we added previously (e.g. service/view), a generic user account will need the “datasets” role to be able to see the datasets tab. This makes the user_roles.pl section for user relatively simple:

$c->{user_roles}->{user} = [qw{
        general
        edit-own-record
        saved-searches
        set-password
        deposit
        change-email
        +service/add
        +service/edit
        +service/view
        +datasets
}],

Pretty easy. At this point our users can now add, edit and view records in the service dataset. Mission accomplished? The next step is to clean up everything, add a brief citation file in addition to the default citation and for me work out how to have per user items so that users can only edit their own items. But I think that is enough for today. Hopefully I’ll be able to work out how to limit items to just their own users but I might need to build my own screens. I’ve done this before with USQ’s Author ID project so I know how to do it, I’d just rather not.

1 comment

1 Comment so far

  1. leonardo February 4th, 2011 7:32 pm

    mhh good review but i cannot find
    “ZIP package linked at the end of this blog post”

Leave a comment

%d bloggers like this: