Describe, disseminate, discover: metadata for effective data citation

Workshop report - Describe, disseminate, discover: metadata for effective data citation

The second in a series of JISC-funded workshops, designed to support good data citation practices among the UK research community, took place at the British Library on July 6th. The workshop focused on metadata for data citation and, in addition to a detailed look at the DataCite schema and how it was developed, featured presentations from a number of DataCite service users who have already incorporated the metadata schema into their workflows.

The workshop was very well attended by representatives from library, repository, publishing and research backgrounds, among others. The high level of interest in this topic seems to reflect the growing awareness in the UK HE community - partly driven by the recent mandates from the research funding councils - of the importance of ensuring that data is accessible and re-usable in the long term.

Alex Ball (UKOLN) set the scene with a useful overview of the issues related to applying and using metadata. He identified three distinct activities which good metadata supports and so should be considered when deciding what metadata is needed for your data: known item search, speculative search and re-use. This was balanced with what he sees as the major unresolved obstacles to widespread data citation: how to deal with data with multiple contributors and how to manage dynamic datasets.

He also presented the results of an interesting study that he produced in 2009 which looked at a range of different data citation styles and identified the elements that they shared. Key citation elements were identified as: Author, Publication Date, Title and Location (or Identifier). These correspond closely with DataCite’s five mandatory metadata properties.

Elizabeth Newbold from the British Library introduced the DataCite schema and explained the background to its development, and the role of the metadata working group in overseeing revisions to the schema. She explained the reasons for the choice of the five mandatory metadata properties and encouraged depositors to consider the twelve optional properties to enhance the discoverability of their data.

Two case-studies followed, with David Boyd from the University of Bristol giving a very interesting overview of his work on data.bris: a JISC Managing Research Data funded project to build a pilot data repository service for the university Arts Faculty (ultimately to be extended university-wide). They have recently developed a minimal mandatory metadata set - based on the DataCite mandatory properties - that will ultimately be required when depositing data to the repository.

The project is still at an early stage of development and ongoing work focuses on making the data repository compatible with the University’s wider Research Information System (via exchange of metadata) and on more general issues such as managing version control and linking datasets to publications.

Next up, Michael Charno from the Archaeology Data Service (ADS) presented the data centre perspective. One of the major challenges for any body charged with looking after data in the long term, is that of information degradation. Michael illustrated the risk of data loss over the longer-term, underlining the need to capture metadata as early as possible. Another difficulty is presented by the diverse types of data that archaeological research generates: from texts and images to virtual reality simulations.

Michael demonstrated the range of resources ADS offers, from those that assist data depositors such as "Guides to Best Practice" that cover a range of topics related to data management including examples of completed metadata records, to the impressive discovery services that the ADS hosts (including a unique "grey literature" search), which are freely accessible online. These services are supported by the detailed metadata (both at project and file-level) input at data deposit.

Rachael Kotarski then presented a summary of her work on preparing datasets for inclusion in the British Library’s online catalogue. Metadata had to be integrated into the BL’s existing catalogue systems, before cataloguing could begin on individual data records. This was a challenge for library cataloguers, who were not used to dealing with datasets.

The final two presentations focused on discovery and services that are underpinned by deposited metadata. Steve Donegan (STFC) described NERC's (Natural Environment Research Council) discovery metadata. NERC’s data centres are among the longest-established in the UK and have a very broad reach, encompassing, among others, the British Antarctic Survey (BAS), British Atmospheric Data Centre (BADC) and the Archaeology Data Service (ADS).

NERC have developed their own Discovery Metadata Standard for data that is deposited in their repositories. This metadata is searchable through the NERC Data Catalogue.

Steve's talk highlighted the challenge of complying with international directives and standards, in particular EU INSPIRE: a European directive which relates specifically to spatial information and requires the collection of certain core metadata elements for compliance.

The final presentation of the day was from David Shotton who, in his role as a bioinformatician at Oxford University, has a longstanding interest in the sharing of research data and has been involved in a range of data management projects in recent years. Professor Shotton presented various work that he and his colleagues have done, including mapping the DataCite schema to RDF for linked data applications (his talk also included a brief introduction to RDF for the uninitiated); developing a DataCite metadata input form which generates an XML record from information entered into a simple web form and plans for developing the Open Citation Corpus - a database of citations to biomedical papers - to include datasets.

All in all, the workshop was very well received and provided a good opportunity for those working in data management to share ideas and best practices from their own experience. The next workshop will take place in October (date tbc). Further details will be published on the British Library website.

Links:

Workshop presentations can be accessed in full on the British Library website.

An archive of tweets posted about the workshop is available here (thanks to @jezcope).

A Mendeley reading list of articles and resources relevant to these workshops is available here.

DataCite helps researchers to find, access, and reuse data (Impressum).