- OBIS and Darwin Core
- Darwin Core terms
- Darwin Core Archive
Darwin Core is a body of standards for biodiversity informatics. It provides stable terms and vocubularies for sharing biodiversity data. Darwin Core is maintained by TDWG (Biodiversity Information Standards, formerly The International Working Group on Taxonomic Databases).
OBIS and Darwin Core
The OBIS schema was based on Simple Darwin Core, a subset of Darwin Core which does not allow any structure beyond rows and columns. It added some terms which were important for OBIS but not supported by Darwin Core at the time. OBIS is now transitioning to Darwin Core.
Darwin Core terms
This is an overview of the most important Darwin Core terms to consider when contributing to OBIS, with guidelines regarding their use. A spreadsheet template with all terms relevant for OBIS can be found here.
OBIS currently has seven required fields:
Taxonomy and identification
The following terms are related to scientific name:
The following terms are related to the identification:
scientificName should always contain the originally recorded scientific name, even if it is invalid. This is necessary to be able to track back records to the original dataset. The name should be at the lowest possible taxonomic rank.
We recommend to not include authorship in
scientificName, and only use
scientificNameAuthorship for that purpose.
A WoRMS LSID should be added in
scientificNameID, OBIS will use this identifier to link the record to the accepted taxonomic name. Go to the namematching tool to find out how to get the LSIDs from WoRMS.
taxonRank can aid us in identifying the taxon that
scientificName refers to, and avoid linking to homonyms, although it is not necessay when a
scientificNameID is provided.
OBIS recommends providing information about how an identification was made, for example by key, or by expert, or by on-board species guide; or by morphology vs. genomics, etc. Who made the taxonomic identification can go in
identifiedBy and when in
dateIdentified. Use the ISO 8601:2004(E) standard for date and time, for instructions see Time. A list of references, such as field guides used for the identification can be listed in
identificationReferences. Any other information can be added to
In case of uncertain identifications, qualifiers such as cf. or aff. should go in
scientificName scientificNameAuthorship scientificNameID taxonRank identificationQualifier ----------------- -------------------------- ----------------------------------------- ----------- ---------------------- Lanice conchilega Pallas, 1766 urn:lsid:marinespecies.org:taxname:131495 species Gadus Linnaeus, 1758 urn:lsid:marinespecies.org:taxname:125732 genus cf. morhua
occurrenceStatus is an important term, because it allows us to distinguish between presence and absence records. We recommend to always fill in this field and to use
A few terms related to quantity,
organismQuantityType, have been recently added to Darwin Core. This is a lot more versatile than the older
organismQuantity should contain the quantity value, and
organismQuantityType the parameter and units. There is a recommended vocabulary for
organismQuantityType which includes values such as
biomassAFDG (biomass ash free dry weight in gram),
percentageCoverage. The quantity terms should be used together with the new sample size related fields.
For stored specimens, the
preparations term can be used to provide the identifier for the record in the collection and to document the preparation and preservation methods.
associatedSequences are global unique identifiers or URIs pointing to respecitively associated media (e.g. online image or video), associated literature (e.g. DOIs) or genetic sequence information (e.g. GenBANK ID).
The recommended vocabulary for
sex can be found here.
eventID scientificName occurrenceStatus organismQuantity organismQuantityType ------- ------------------ ------------------ ------------------ ---------------------- 1 Abra alba present 12 organisms 1 Pectinaria koreni present 48 organisms 2 Abra alba absent 0 organisms 2 Pectinaria koreni present 48 organisms
Record level terms
basisOfRecord is a required field and specifies the nature of the record. Possible values include
institutionCode identifies the institution which owns the data,
collectionCode identifies the collection or dataset within that institute. Collections cannot belong to multiple institutes, so all records within a collection should have the same
catalogNumber is an identifier for the records within the dataset or collection.
occurrenceID should be globally unique. A globally unique identifier could for example be constructed from the
collectionCode and the
institutionCode collectionCode catalogNumber occurrenceID --------------- ---------------- --------------- -------------- VLIZ NSBS 123 VLIZ_NSB_123 VLIZ NSBS 456 VLIZ_NSB_456
bibliographicCitation allows for providing different citations on record level, while a single citation for the entire dataset needs to be added to the metadata.
modified is the most recent date-time on which the resource was changed. It is required to use the ISO 8601:2004(E) standard, for instructions see Time.
dataGeneralizations refers to actions taken to make the shared data less specific or complete than in its original form. Suggests that alternative data of higher quality may be available on request.
Occurrence coordinates should be provided in decimal degrees on the WGS 84 (EPSG:4326) geodetic datum, along with
coordinateUncertaintyInMeters, which is the smallest circle around the given
decimalLongitude containing the whole location.
The spatial reference system of
decimalLongitude should be documented in
geodeticDatum. Recommended best practice is use the EPSG code. Coordinates in degrees/minutes/seconds can be converted to decimal degrees using our coordinates tool. We also provide a tool to check coordinates or to determine coordinates for a location on a map. This tool also allows geocoding location names using marineregions.org.
If the locality of an occurrence is known but not the exact coordinates, we need to use a geocoding service to obtain coordinates. Marine Regions has a search interface for geographic names, and provides coordinates as well as a map of the location. Another option is to use Google Maps: after looking up a location, the decimal coordinates can be found in the page URL.
A Well-Known Text (WKT) representation of the shape of the location can be provided in
footprintWKT. This is particularly useful for tracks, transects, tows, trawls, or when an exact location is not known. WKT strings can be created using our WKT tool. This tool also calculates a midpoint and a radius, which can be used for
Some examples of WKT strings:
LINESTRING (30 10, 10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTILINESTRING ((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)),((15 5, 40 10, 10 20, 5 10, 15 5)))
Keep in mind while filling in
maximumDepthInMeters that this should be the depth at which the sample was taken and not the water column depth at that location.
locationID is an identifier for the set of location information (e.g. station ID, MRGID from marineregions).
eventID is an identifier for event, i.e. something that happened at a certain place and time.
parentEventID is an identifier for a parent event, which must refer to an existing
eventRemarks can hold info on cruise, expedition, research vessel, station etc.
habitat is a category or description of the habitat in which the Event occurred.
The date and time at which an occurrence was recorded goes in
eventDate. This term uses the ISO 8601 standard. OBIS recommends using the extended ISO 8601 format with hyphens.
ISO 8601 dates can represent moments in time at different resolutions, as well as time intervals which use
/ as a separator. Date and time are separated by
T. Times can have a time zone indicator at the end, if this is not the case then the time is assumed to be local time. When a time is UTC, a
Z is added. Some examples of ISO 8601 dates are:
1973-02-28T15:25:00 2005-08-31T12:11+12 1993-01-26T04:39+12/1993-01-26T05:48+12 2008-04-25T09:53 1948-09-13 1993-01/02
Besides year, month and day numbers, ISO 8601 also supports ordinal dates (year and day number within that year) and week dates (year, week, and day number within that week). These dates are less common and have the formats
YYYY-DDD (for example
YYYY-Www-D (for example
ISO 8601 durations should not be used.
sampleSizeUnit are very important when a organism quantity is specified. Recommended best practice is to use SI units or non-SI units accepted for use with SI for the
sampleSizeUnit. Examples are
square metre and
For example, in the case of a macrofauna sediment core and meiofauna subsamples:
parentEventID eventID scientificName eventDate sampleSizeValue sampleSizeUnit ------------- --------- ------------------------ ----------- ----------------- ------------------- 1 Abra alba 2015-10-02 0.5 square metre 1 Lanice conchilega 2015-10-02 0.5 square metre 1 2 Sabatieria pulchra 2015-10-02 10 square centimetre 1 2 Leptolaimus sebastiani 2015-10-02 10 square centimetre 1 3 Pselionema longiseta 2015-10-02 10 square centimetre 1 3 Pselionema simplex 2015-10-02 10 square centimetre
Darwin Core Archive
Darwin Core Archive (DwC-A) is a standard for publishing biodiversity data using Darwin Core. Darwin Core archives contain text files which are logically arranged in a star schema. This means that there is one core file and (optionally) multiple extensions files. Core files contain information on taxa, occurrences, or sampling events.
Archives with an Event core will be supported in the near future. With an Event core, some properties can be moved from the occurrence to the event level and no longer have to be repeated for every single occurrence. As each event can point to a parent event (with the
parentEvent field), extensive hierarchies of events can be constructed were different fields are only filled in at the appropriate level (for example: cruise > leg > station > sample > subsample).
meta.xml descriptor file maps the core and extensions files to Darwin Core terms, and describes how the core and extensions files are linked.
<archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml"> <core encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Event"> <files> <location>event.txt</location> </files> <id index="0" /> <field index="1" term="http://rs.tdwg.org/dwc/terms/eventID"/> <field index="2" term="http://rs.tdwg.org/dwc/terms/parentEventID"/> <field index="3" term="http://rs.tdwg.org/dwc/terms/decimalLatitude"/> <field index="4" term="http://rs.tdwg.org/dwc/terms/decimalLongitude"/> </core> <extension encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Occurrence"> <files> <location>occurrence.txt</location> </files> <coreid index="0" /> <field index="1" term="http://rs.tdwg.org/dwc/terms/basisOfRecord"/> <field index="2" term="http://rs.tdwg.org/dwc/terms/occurrenceID"/> <field index="3" term="http://rs.tdwg.org/dwc/terms/scientificName"/> </extension> <extension encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/MeasurementOrFact"> <files> <location>measurementorfact.txt</location> </files> <coreid index="0" /> <field index="1" term="http://rs.tdwg.org/dwc/terms/measurementType"/> <field index="2" term="http://rs.tdwg.org/dwc/terms/measurementValue"/> <field index="3" term="http://rs.tdwg.org/dwc/terms/measurementUnit"/> <field index="4" term="http://rs.tdwg.org/dwc/terms/measurementMethod"/> </extension> </archive>