Darwin Core Archive including OBIS-ENV-DATA

Darwin Core Archive (DwC-A) is the standard for publishing biodiversity data using Darwin Core terms. It is the preferred format for publishing data in OBIS and GBIF. The conceptual data model of the Darwin Core Archive is a “star schema” with a core record, such as an occurrence or an event, as the center of the star. Extension records, radiating out of the star, can optionally be associated with the core, linked by database keys such as an ID colum. This means that there is only one core file and (optionally) linked to multiple extension files. So the entire schema is only two levels deep: a single core with zero, one, or many extensions. Each core-to-extension relationship can be one-to-one, where there is only one extension record for each core record - also called “Simple Darwin Core”, or one-to-many, where for example many environmental or biometric measurements and/or many biological occurrence records, can be associated with a single sampling event. The biodiversity data and metadata are published using GBIF’s Integrated Publishing Toolkit (IPT). The IPT software assists the user in mapping data to valid Darwin Core terms and archiving and compressing the Darwin Core content with: (i) a descriptor file: meta.xml that maps the core and extensions files to Darwin Core terms, and describes how the core and extensions files are linked, and (ii) the eml.xml file, which contains the dataset metadata in Ecological Metadata Language (EML) format. For instructions on how to enter the metadata go to EML. Al these components (i.e. core file, extension files, descriptor file and metadata file), compressed together (as a .zip file), comprise the Darwin Core Archive.

OBIS-ENV-DATA

Data collected as part of marine biological research often include measurements of habitat features, such as physical and chemical variables of the environment, biometric measurements (body size, abundance and biomass combined, etc) as wel as details regarding the nature of the sampling or observation methods, equipment, and effort.

In the past, OBIS only dealt with Occurrence Core and additional measurements were added in a structured format (e.g., JSON) in the DwC term dynamicProperties. This was far from ideal (difficult format, no standardisation of terms and difficult to extract).

With the release and adoption of a new Core type: Event Core, OBIS can now go beyond species occurrence based records, and make the sampling event the central data entity linking biological, environmental, and sampling information and link them to the appropriate event level using the occurrence Extension and the MeasurementorFact Extension.

Extended MeasurementorFact extension (eMoF)

As part of the IODE pilot project: Expanding OBIS with environmental data OBIS-ENV-DATA, OBIS introduced a customized extended MeasurementOrFacts Extension or eMoF, which extends GBIF’s DwC MeasurementOrFact Extension with 4 new terms: occurrenceID, measurementTypeID, measurementValueID and measurementUnitID.

Figure: overview of an OBIS-ENV-DATA format. Sampling parameters, abiotic measurements, and occurrences are linked to events using the eventID (full lines). Biotic measurements are linked to occurrences using the new occurrenceID field of the ExtendedMeasurementOrFacts extension (dashed lines).

The eMoF Extension is used in combination with the Event Core and the Occurrence Extension to capture both abiotic measurements and biotic measurements (called combined datasets). The occurrenceID is used to link biotic measurements in the eMoF extension with the the Occurrence Extension (in addition to the eventID which links the eMoF to Event Core (which is necessary in a star schema where all extensions are linked to the Core file). Abiotic measurements as well as sampling facts in the eMoF are linked to the event Core throuth the eventID (no occurrenceID is needed). So the eMoF file is used to store organism quantifications (e.g. abundance, wet weight biomass, % live cover), species biometrics (e.g. body length), facts documenting a specimen (e.g. living/dead, behaviour, trophic status), facts documenting the sampling activity (e.g. sampling device, sampled area, sampled volume, sieve mesh size) and abiotic measurements (e.g. temperature, salinity, oxygen, sediment grain size, habitat features).

The MoF terms: measurementType, measurementValue and measurementUnit are completely unconstrained and can be populated with free text annotation. While free text offers the advantage of capturing complex and as yet unclassified information, the inevitable semantic heterogeneity (e.g. of spelling or wording) becomes a major challenge for effective data integration and analysis. Hence, OBIS added 3 new terms: measurementTypeID, measurementValueID and measurementUnitID to standardise the measurement types, values and units. Note that measurementValueID is only used for standardizing sampling facts (e.g. sampling instrument) and not measurements. The 3 new terms should be populated using controlled vocabularies referenced using Unique Resource Identifiers (URIs). OBIS recommends to use the internationally recognized NERC Vocabulary Server, developed by the British Oceanographic Data Centre (BODC).

Measurement or Fact vocabulary

For an overview of the most common parameters in OBIS, linked to the proposed BODC vocab term see Measurement or Fact vocabulary. In case of missing terms, below are the vocabularies to be used:

OBIS-ENV-DATA and Darwin Core terms

The DwC terms that are most relevant to OBIS, organized in the OBIS-ENV-DATA format, are the following (those in italics are mandatory):

Event Core

eventID, parentEventID, eventDate, habitat, minimumDepthInMeters, maximumDepthInMeters, decimalLatitude, decimalLongitude, coordinateUncertaintyInMeters, footprintWKT, modified

Occurrence Extension

eventID, occurrenceID, scientificName, scientificNameAuthorship, scientificNameID, kingdom, taxonRank, identificationQualifier, occurrenceStatus, basisOfRecord, modified

Extended MeasurementorFact Extension

measurementID, eventID, occurrenceID, measurementType, measurementTypeID, measurementValue, measurementValueID, measurementUnit, measurementUnitID, measurementAccuracy, measurementRemarks

When to use Event Core

• When the dataset contains abiotic measurements, or other measurements which are related to a sample.
• When specific details are known about how a biological sample was taken and processed. These details can be expressed using the eMoF and the newly developed Q01 vocabulary.

Event Core should be used in combination with the Occurrence Extension and the eMoF.

When to use Occurrence Core

• No information on how the data was sampled or samples were processed.
• No abiotic measurements are taken or provided.
• This is often the case for museum collections, citations of occurrences from literature, individual sightings.

Datasets formatted in Occurrence Core, should use the eMoF to record any abundances, biomasses, and other biotic measurements or facts.