Advanced Search Browse by taxonomic groups
 
How to cite OBIS data
   
Notes to users
About the data

About the data

Where does OBIS get its data?

What kinds of data is OBIS interested in?

What quality control system is in place for data?

Why do some points fall on land?

Will OBIS pay me for my dataset?

Who has intellectual property rights to the data?

How will my data be cited and credited in OBIS?

How do I contribute data to OBIS?

What if I have sensitive data which should not be openly accessible?

If I want to contribute data to the OBIS system, what should I do next?

Data Schema and metadata

What is the OBIS data schema?

What is DiGIR?

Is the OBIS Data Schema compatible with the Darwin Core standard?

Isn't it hard to implement all the OBIS Data Schema fields?

Do I have to use the same field names in my database as the OBIS Schema?

Why do you have fields for places that may not apply in the ocean?

What do you mean by the terms "Collected" and "Observed"?

OBIS Schema data types

Can the Schema accommodate tagging data, or multiple sightings of a single individual animal?

What is metadata?

About databasing

How do I start designing a database?

Do I need to use a relational database?

Which relational database should I use, if I use a relational database?

What hardware and software do I need to do to serve data through the web?

About the data

Where does OBIS get its data?

OBIS publishes data on behalf of scientists from government agencies, museums, universities, commercial companies, and non-governmental organisations. OBIS is always seeking new contributors.

What kinds of data is OBIS interested in?

OBIS is a marine biogeographic information system, meaning that we concentrate on datasets that record particular species (or higher taxonomic group) from particular marine locations, at particular times. At present, we can only publish data where the locations are recorded as latitude and longitude, not as place names. Our focus is on high taxonomic quality, so datasets where organisms have been identified by professional or trained biologists are our priority. In the near future, we will be expanding to take in environmental datasets (i.e. coverage of physical, chemical, and geological parameters) that are relevant to understanding the distribution of species. So we are interested in hearing from potential contributors of these datasets, and welcome your contact, but are still in the process building this facility.

What quality control system is in place for data?

Data published through OBIS must come from credible, authoritative sources. The scientists and institutions responsible for collecting and managing the data are clearly named. Before publication, the data must pass through a series of technical controls described below, and these are repeated every time the data may be crawled again from its source. Any errors, such as species name misspellings, names not recognised in OBIS, and possible mapping errors, are reported to the data provider to review, and if necessary, correct. Thus the next time the data are published they are more correct, and the source database quality is also improved. Data use is a very important way of finding actual and possible errors in data. Users may contact the data source directly or OBIS with such issues.

The OBIS Quality Control protocol is as follows:

  1. If the required data fields are not properly filled, notification will be sent to the Data Provider. No further action will be taken until the required fields are filled.
  2. If fields have questionable values, notification will be sent to Data Provider. These questionable values will be set as empty in the data published.
  3. Data located on land will be reported to the Data Provider but will not be deleted unless instructed by the Data Provider, because they may represent a species in an estuary or the centre point of a location. If a Data Provider changes the values, new values will show up after the next round of crawling.
  4. If species names cannot be (a) verified against known valid names in OBIS, or (b) to the OBIS taxonomic hierarchy, the Data Provider will be notified so they can check they are current and correct. Such names will be classified as ‘unassigned’ on the OBIS portal. People can search on these names but they will be noted as not verified. Some non-verified names may be assigned a position in the taxonomic hierarchy by virtue of their genus.
  5. The portal staff will communicate with data providers to inform them of any problems and improve data quality. They will check that the data conforms to the metadata description of the dataset; i.e. it should have the correct number of records and species in the right geographic locations. After the data is transferred to the server from where it will be published online, a form email will be sent to the technical person and manager specified, detailing number of records obtained and missing records if applicable, time of next crawling, and any errors identified.

Why do some points fall on land?

When plotting data on a map, it’s soon obvious that some of the points fall on land. While sometimes this can be a genuine mistake (of which we would be grateful if you could inform us!), in most cases this is deliberate, and a consequence of the way data are extracted from literature. Often, data comes to OBIS not with a precise georeference including a latitude/longitude pair, but as a name of a country. If this country is large, there is not always a rational way to place this observation in a point location that falls in the ocean. This is especially true for countries that border different seas or oceans. Think of a species recorded from ‘USA’ – there is no rational way of deciding in which ocean to place this observation, since the USA borders on two vastly different oceans. Also, it is in no way certain that the one observation for ‘USA’ means that the species occurs in both the Pacific and the Atlantic Ocean. So we consequently plot the point representing such an observation in the only place we can consequently place it – in the centre of the USA, which obviously is land.


Will OBIS pay me for my dataset?

No, OBIS does not buy data. It is a group of contributors who have agreed to publish their data through a central portal to make it more accessible. However, we may be able to make suggestions for places where you could submit a proposal to fund developing datasets or for digitizing existing datasets.

Who has intellectual property rights to the data?

OBIS claims no ownership nor rights to the data sets it publishes. All rights remain with the data source, whom may at any time decide to remove their data from OBIS. This is true whether you serve the data yourself, or whether you place your dataset at a Regional OBIS Node or the central OBIS portal for serving.

How will my data be cited and credited in OBIS?

All data published through OBIS are labelled with the organization and database from which the data came, and a standard citation is provided. Users are expected to cite the data providers when using data from OBIS as they would cite papers from a conventional print publication (see How do I cite data ...?).

How do I contribute data to OBIS?

There are two models for sharing data through the OBIS system:

  • You become a distributed data contributor. This means that you keep your dataset locally, and set up a server that can respond to OBIS queries. This requires "mapping" your dataset to the OBIS schema (which is not as hard as it sounds!) and installing a free software package called DiGIR to communicate with the portal. There are more details on this below.

  • You provide your data set in electronic form to a Regional OBIS Node, the central data portal, or another existing Data Provider, and it is published from there.

Which choice is right for you depends on whether you are interested in maintaining your own server, and also whether you expect to be making regular updates to the data set. OBIS prefers groups to be distributed data contributors, because we think it is best for the data contributor, who knows the dataset best, to maintain it. That way you can add data and make corrections directly. But if you cannot or do not wish to set up a server, OBIS is happy to host data. In either case the data will be credited to you.

What if I have sensitive data which should not be openly accessible?

The short answer is that OBIS is committed to free, open, public access to date, so if you have sensitive data you probably don't want to publish it through OBIS (or any other publication). However, there are some particular concerns we may be able to help with. For example, if you are concerned about giving the precise location of a rare or commercial species, then we may be able to represent your data at a lower spatial resolution, or to give a bounding box instead of a point location. If you have data that you would like to publish but want to wait until its analysis is published elsewhere, we can help you set up your dataset appropriately now, but agree not to publish it for a certain amount of time.

If I want to contribute data to the OBIS system, what should I do next?

Please read the information about OBIS on this website, and contact the Executive Director of OBIS.

Data Schema and metadata

What is the OBIS data schema?

The OBIS schema is a list of data fields with names, descriptions, and format notes. It is an extension to the Darwin Core Version 2 standard. When the OBIS portal sends queries out to its distributed data contributors, the portal will request data using these fields and needs to have data returned using these fields. The DiGIR software provides the programming to turn an OBIS query into a search on your particular database, but in order to install DiGIR you need to "map" the OBIS schema fields to the fields in your database. Download the following files for details.

What is DiGIR?

DiGIR is the software through which OBIS communicates with its distributed data contributors – defining how data is exchanged. When a user of the OBIS portal inputs in a query (such as 'show me all the locations where the fish, Beryx splendens, has been found'), DiGIR allows the portal to send that query to the data contributors, for the data contributors to translate that query into a search on their local database, and to send the data back to OBIS. More information on DiGIR can be found on digir.sourceforge.net. Please contact OBIS before installing the DiGIR software to ensure that you are using a compatible version and have the OBIS configuration details.

Is the OBIS Data Schema compatible with the Darwin Core standard?

Yes. The OBIS Data Schema was built as an extension to the Darwin Core version 2 standard ( http://speciesanalyst.net/docs/dwc/index.html). The Darwin Core is a standard that is used by the Global Biodiversity Information Facility and others. The OBIS Schema also contains some additional fields for holding information that the Darwin Core does not handle. So, if you implement the OBIS Schema in your database, you will be compliant with both the Darwin Core standard and the OBIS standard.

Isn't it hard to implement all the OBIS Data Schema fields?

Think of the OBIS schema as a menu of options. There are only four fields that are required in order to be compatible (latitude, longitude, taxonomic name, and date/time of last modification). For all of the other fields you only need to include them if you want to have that information in your database. If you don't plan to hold a particular type of information, you can leave it out of your database. However, if you do include a type of information covered by a field in the OBIS schema, you should represent it as described in the OBIS schema.

Do I have to use the same field names in my database as the OBIS Schema?

No, you can use any field names you like. When you implement DiGIR, it will ask you to tell it which fields relate to which OBIS Schema field. You should keep track of this and make sure that there is a one-to-one mapping of fields in the OBIS schema to your database and that you use the required format for the field. For example, because the OBIS Schema has separate fields for day, month, and year of the record, it is best to hold these in three separate fields and not in a single date field (or at least have a plan for how to separate the pieces for serving to OBIS).

Note that most database software will allow you to do automatic operations on fields. You may prefer to enter your location information as degrees and minutes for latitude and longitude instead of decimal degrees. That's fine, because it will be easy for you to create a database view with a latitude field in decimal degrees calculated from (latitude degrees + (latitude minutes/60)).

It may be easiest for you to implement DiGIR by creating a "view" or query in your database that has all the OBIS schema fields in one table. You may have separate tables for "species names", "observations" etc., but may create one virtual view that does all the joins necessary for the OBIS query. You can also have it do any reformatting (such as the latitude degrees and minutes calculation mentioned in the above question) required. Then it will be easy during DiGIR installation to map onto the OBIS schema.

Why do you have fields for places that may not apply in the ocean?

Because these are Darwin Core fields. To be compliant with the Darwin Core, OBIS must allow all Darwin Core fields to be entered. But remember, these are all optional fields. Don't even put them in your database if you don't need them - most OBIS datasets won't. OBIS operates off latitude and longitude locations, which are why these two are the only required locality information.

What do you mean by the terms "Collected" and "Observed"?

The OBIS databases hold information on the locations where different species have been found. The act of finding a species at a place is called a "collection" or an "observation" throughout the schema documentation. This term is meant to apply very broadly, and includes cases where species were literally seen during a visual search, were collected in a sample of any kind (research survey, fisheries catch date, etc.), where a specimen in a museum indicates the location where it is from, etc.

OBIS Schema data types

The schema indicates the data type for each field. These are general categories, and your particular database software may use different terms. Where there are additional restrictions placed on the data format, this will be indicated in the Description.

Can the Schema accommodate tagging data, or multiple sightings of a single individual animal?

Yes, the OBIS Schema can accommodate data from multiple observations of a single individual organism, such as data produced by tagging studies. To implement this, the user should 1) create a record in their database for the individual and use the Catalog Number as a unique identifier for that individual; and 2) each observation for that individual should be entered as a separate record in the database and tied to the individual by setting the Related Catalog Item field equal to the Institution Code, Collection Code, and Catalog Number of the individual record, and the Relationship Type field to 'point observation for tagged animal of.' (see the Technical Resources page for more on the OBIS Schema fields).

What is metadata?

Metadata are information about data records. Some metadata are included in the OBIS data schema, and describe features of individual records (e.g. sampling method). Other “discovery” metadata describe a dataset. OBIS requires data providers to provide Discovery metadata in advance of data publication to allow comparison of the data published with what was anticipated, and to be aware of how planned data publication will contribute to OBIS. A standard citation is important to enable users to cite the dataset correctly. Other metadata, such as sampling, taxonomic, geographic, and habitat information enables OBIS to identify data gaps, and further data exploration features, such as a filter that would allow users to select datasets with planktonic data.

OBIS discovery metadata fields are: Dataset name, Citation (so data users can cite the dataset in standard format of author (or editors), title (descriptive), host institution), Taxonomic coverage, Geographic coverage, Temporal coverage, Habitat coverage, Total distribution records, Total number of taxa, Collection method, Data source, Abstract (describes dataset), Scientific Contact (responsible for data collection and accuracy), Technical contact (responsible for data management), Website, Date this form completed, Publications from this data (so users can read these for more details about the data origins and its uses). Where standards exist for metadata these fields comply with them, but no standards yet exist for taxonomic, habitat and sampling metadata. OBIS is collaborating with international initiatives (including MEDI of IODE) to develop marine ecological metadata standards that build on ISO, FGDC, GCMD and others. Collaboration between OBIS and the United States National Aeronautics and Space Administration (NASA) Global Change Master Directory (GCMD) has developed the OBIS Master Directory at GCMD: http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=OBIS&MetadataType=0


About databasing

How do I start designing a database?

First, figure out what information you have or plan to have. If you already have datasets, either in electronic or paper formats, look at the data that are included. Make a list of these fields (i.e. the column headings in your data table). Then go through the OBIS schema. If there is one or more fields in the OBIS schema that cover information that you want to hold, then use that field name and the suggested format. If there are one or more fields in the OBIS schema that do not apply to your data, just leave them out. If there is additional information that you want to keep in your database that is not covered by the OBIS schema, then you can add additional fields.

Do I need to use a relational database?

A relational database is a class of software that allows you to hold data in linked tables. You do not need to use a relational database, as you can hold your information in a "flat file" such as an Excel worksheet. However, a relational database offers great advantages. First, it lets you enter information more efficiently. For example, you can enter a scientific name once into a table of names. Then for every record or observation you have for that species, you won't need to type that name in again, you can just pick it from a drop-down list. So it is faster and you don't have to worry about making typing or spelling mistakes. In addition, relational databases can be queried in more complex ways than a spreadsheet. For example, you can ask for "all the records for species X that were caught north of 30° north, shallower than 300m, and between 1980 and 1985."

A note on text files. Delimited text files are good for archiving data (saving it in such a way that someone years from now will probably still be able to get to it), but not very good if you actually want to do things with the dataset, like extract certain data of interest, update it, or serve it.

Which relational database should I use, if I use a relational database?

Several products are available, and many are quite similar and have similar functionality, so this isn't a critical decision. Microsoft Access is common desk-top commercial software packages. Microsoft SQL Server, Oracle, Sybase and PostgresSQL are common "industrial-strength" databases. This means that they are designed to be efficient with large volumes of data. Generally, if you expect a dataset with hundreds of thousands of records, then you should consider one of the industrial databases, or one of several available free-ware solutions.. If you have tens of thousands of records or fewer, then Access should be fine. The trade-off is that the industrial packages tend to cost more and also be a little less user-friendly. PostgreSQL and MySQL are free, open-source relational database packages that are quite good. You can find more information on MySQL at http://www.mysql.com and on PostgreSQL at http://www.postgresql.org/ (note that companies may sell packages that include extra documentation, etc., but the core software is free). PostgreSQL is more powerful than MySQL; especially its geographic features which are why many people prefer it to MySQL. But power comes at the expense of more complexity. Whichever one you choose, just make sure it is "ODBC compliant" – this means that it can communicate with other sources (for exporting, for serving data, etc.). Most of the relational database packages are ODBC complaint, but some "home-grown" systems are not.

What hardware and software do I need to do to serve data through the web?

If you want to publish data directly to OBIS, you will need a computer with an operating system that has your database, server software, and DiGIR installed. If you would like to set up your own web page, you will also have to program to create your web pages and the search functions users will access. HTML is the language that web pages are built in; several languages such as Perl/CGI, PHP, Python, or Java can be used to create search forms for users to enter data into, and these will need to include the SQL or other commands that actually search your database. Software such as Dreamweaver helps to make programming web pages easier. For those on a budget, there are some good freeware options. Linux is a free operating system, Apache is a server, and MySQL and PostgreSQL (see above) are free relational databases.


Last modified by Edward Vanden Berghe on April 11, 2007.
 

Please provide us with feedback
 
OBIS is a project of the