Metadata
Data that provides information about other data [wikipedia].
Metadata is structured (and often standardised) information associated with a (data) resource, that provides information about the resource itself.
Metadata provides context and pragmatics (which may be general or domain-specific) for its resource.
Purpose
Metadata allows you to:
Make resources findable, by providing a high-level overview which can be inserted into a search index
Make resources reusable, by providing information about how they were generated in the first place.
Thinking about types of metadata allows you to:
Decide who should be produce metadata and when that should happen
Types of Metadata
A 'type' of metadata is a broad-brush classification of what a metadata element is for and (correspondingly) who might create it and when. The How To Fair project describes three different types of metadata, Administrative, Description and Structural. Wikipedia defines many more!
How To Fair's Structural metadata description incorporates two aspects, we've differentiated them into "provenance" and "form" to help highlight the different roles and times.
Administrative metadata is relevant for managing data, for example:
Project
Resource owner
Collaborators
Funder
Organisation
License
These can usually be assigned before you collect or create the data resource itself.
Descriptive (citation) metadata allows people to discover and identify the resource:
Author
Title
Abstract
Keywords
Topic
Persistent identifier
Related resources
These are usually assigned at the point of publication.
Tip
To facilitate data discovery, descriptive metadata can be made far more powerful than merely a citation.
For example, "find data containing rainfall in Africa last year" requires that a search index or graph is populated with temporal and locational values extracted from the data or its structural metadata.
In such cases "structural" metadata might also be thought of as "descriptive", and the lines blur between them. Extra search fields might include, for example:
Datetimes or ranges
Geolocations
External conditions
Other content-derived fields
Data type
Structural (provenance) metadata describes how a resource came about, for example:
Collection method
Sampling procedure
Assumptions made
Researcher notes
These metadata have to be gathered by the researchers according to best practice in their research community. They should be added continuously throughout data generation and processing.
Tip
The semantics used in defining the data and its structural metadata should provide meaning and context to the data in a formal and machine-readable way. However, where richer meaning or context is difficult or impossible to formally capture, this structural metadata should be used to convey such information.
Structural (form) metadata describes how a resource is internally structured, for example:
Data size
Storage details (eg file types, encodings and/or database details)
Structural (form) metadata arises from the decisions taken by the data engineering team, who should architect their infrastructure in close collaboration with researchers to anticipate size, content and format.
These details should (ideally) be established a priori to data generation (see "de-risking a project"), but may evolve throughout the lifetime of a resource.
The Dublin Core Metadata Initiative
The Dublin Core Metadata Initiative is an organisation dedicated to metadata. They give a set of fifteen widely used metadata elements. If publishing a data resource on the web, these elements serve as a guide for metadata should be included in order to best facilitate information discovery.
See the Dublin Core elements
See the DCMI elements page for full descriptions.
Contributor
Coverage
Creator
Date
Description
Format
Identifier
Language
Publisher
Relation
Rights
Source
Subject
Title
Type
Whilst metadata need not be limited to this (particularly in cases where rich descriptive metadata can be used to query for datasets, as opposed to search based methods), adhering to the Dublin Core should ensure a good search ranking and help you conform to the FAIR principles.
Purpose
The Dublin Core allows you to:
Quickly define a set of metadata to make a published data resource findable
Tip
As discussed in Structural (form) metadata, a schema can be an important element of your metadata, for describing the content of the data resource.
It is also possible to use schema in a different way - not to describe the resource itself, but to describe the associated metadata!
This allows you to check that the provided metadata is correct. The Dublin Core Metadata Initiative publish schema for just this purpose.
Useful syntaxes for metadata
We think the following syntaxes will be most useful for definition (but this is by no means an exhaustive list):
Last updated