eAtlas metadata management

This page is out of date.

This page gives an overview of how the various components of the eAtlas metadata system connect with each other. It also shows how to resolve some of the problems that can occur when publishing metadata records.

Getting metadata records onto the eAtlas website requires some understanding of the setup of the metadata system. The eAtlas uses a number of off-the-shelf pieces of software to implement the metadata system. These are then linked together. Using off-the-shelf software reduces the amount of development needed by the eAtlas team, but does come at the cost of less integration and having to live with a few quirks.

eAtlas Metadata System workflow

Figure 1 shows the three software components that make up the eAtlas metadata system. GeoNetwork is where the original metadata records are written and published. The MetadataViewer converts these records into webpages that are nice looking and indexable by Google, and finally Drupal harvests these records so that they are discoverable in the website search and are shown in the various dataset listings on the website.

The website only stores and shows a summary of the metadata records. It contains enough text so that is can perform good searching, but this text is never displayed to users. All links on the website link directly to the MetadataViewer. Internally in Drupal all text content elements are stored as "Nodes". Each node has a title and body text as well as other information fields. Each of the types of content (Site pages, Articles, People, Institutions, Projects, etc) in Drupal are Nodes with different fields attached to them. The metadata records are no different. They are nodes that are automatically popoulated from an RSS (Rich Site Summary) that lists the metadata records. This RSS feed gets its records from the MetadataViewer.

Quirks - Why isn't my record on the website?

The system has a few quirks that need to be understood inorder to manage the collection of metadata records in the website.

Periodic harvesting

The website uses batch harvesting performed once per hour to copy over all new metadata records. As a result new records can take up to 1 hr to appear.

Harvesting external records

If a record in GeoNetwork has been harvested from an external GeoNetwork, such as from the AIMS GeoNetwork, then there is a good chance that the timestamps on the record will be "old" and so the periodic Drupal harvester will not detect the new record.

Unpublished metadata appearing on website

MetadataViewer has a bug that allows viewing of unpublished records. If you go to a metadata records "Point of Truth" URL this will ask the MetadataViewer to render the metadata record. The MetadataViewer ignore whether the record has been published or not and simply displays the record (this is the bug). It also retains a copy of the record in its cache and so when Drupal asks for any new records the MetadataViewer tells Drupal about the unpublished record making the Bug even worse. The "Point of Truth" for a record is the final published URL of the record and looks something like: https://eatlas.org.au/data/uuid/aa1cd85a-a32d-4db2-a7b5-0c07a54fb572

Deletion of records doesn’t propagate well

Records that are deleted in GeoNetwork do not propagate well through the various stages. In particular the default periodic harvester only asks for new records. It is not told about records that have been deleted.

Metadata System Quirks

How do we work around the quirks?

There are a few actions we can use to resolve most of the issues with metadata records not publishing on the website as expected. During normal operation new metadata records are automatically harvested by the hour and published on to the website and you don't need to do any of the following steps. The following steps are to be used only when there are problems.

This section uses links that have been added to the "Content administration" menu.

Links for managing the metadata records

1. Fix up GeoNetwork

We must make sure that our records are correctly setup in GeoNetwork correctly. Are they published appropriately? Have they got their category set so they will appear in the correct eAtlas. If they are harvested records they will be unpublished by default.

2. Clear MetadataView Cache

We can clear the cache of the MetadataViewer.

You will find this link in the "Content Administration" menu. It will be labelled: 1. Clear metadata records (MetadataViewer)

This will make it forget about all records (removing its memory of all the accidentally viewed unpublished records) and deleted records.

3. Manage the harvested Drupal metadata records

We can manage the harvested Drupal records using the 2. Manage metadata harvest (Drupal) link in the "Content Administration" menu. This will take us to a page that displays all the RSS feeds coming into the site. The metadata records are brought in by the "Dataset records" feed.

Removing metadata records

Click on "remove items" for the "Recent datasets" row. This will then display a warning that you are about to remove all the downloaded items. This is OK as we will reharvested them back in the very next step.

Removing all harvested metadata records

Once all the records have been removed we need to retrigger a harvest that will cause Drupal to request all records, regardless of how old they are.

Reharvesting metadata records

Once the harvest is complete you should see the number of items downloaded is approximately 200.

You can now check that the dataset records are all correctly imported into the website.

Warnings

Don't manually edit metadata records in Drupal

All the metadata records that are harvested by Drupal appear as editable nodes. You can go in and edit or delete them. Unfortunately the next time someone triggers a complete reharvest all the modifications you made will be lost. Remember that GeoNetwork is the master copy of the records and problems need to be fixed there.

Don't modify the other RSS feeds

The permissions system in Drupal does not allow us to provide permissions to just administer one of the RSS feeds. Instead you have access to all the feeds used in the site. Please don't modify any feeds other than the "Recent datasets" feed.