- Create a resource
- Upload data
- Map to Darwin Core
- Add metadata
- Publish your data to OBIS
- Publish your data as a dataset paper
Installing Tomcat 8 on Ubuntu
Apache Tomcat 8.0.x requires Java 7 or later.
First download the latest Tomcat binary distribution from here and unpack:
wget http://www.us.apache.org/dist/tomcat/tomcat-8/v8.0.28/bin/apache-tomcat-8.0.28.tar.gz tar xvzf apache-tomcat-8.0.28.tar.gz
Create a Tomcat admin user in
<tomcat-users> <role rolename="manager-gui"/> <role rolename="manager-script"/> <role rolename="admin"/> <user username="myusername" password="mypassword" roles="admin,manager-gui,manager-script"/> </tomcat-users>
chmod +x apache-tomcat-8.0.28/bin/startup.sh chmod +x apache-tomcat-8.0.28/bin/shutdown.sh chmod +x apache-tomcat-8.0.28/bin/catalina.sh ./apache-tomcat-8.0.28/bin/startup.sh
Tomcat should now be available at http://127.0.0.1:8080.
Navigate to the Tomcat URL, open the the Manager App, and upload the IPT .war file where it says
Select WAR file to upload.
Publish your data
With regard to populating the IPT with marine data for OBIS, there are two possible approaches:
Manager driven: You as node manager take the responsibility of describing, checking and uploading the data and metadata to the IPT. The data provider can send you the data ‘as such’ or you can make agreements with your providers on the accepted OBIS data format and standards. This approach will give you a very good knowledge of what data is available. It can be time-consuming, as (extended) communication with the data provider will be necessary to document the metadata and to re-format the data to the OBIS standards.
User driven: You as manager guide (some of your) data providers to describing and uploading the data and metadata to the IPT. Your main task will be to make sure that all relevant information and data for OBIS is available and that you perform the necessary quality checks before the data are released to OBIS.
In most cases, there will be a combination of these two approaches. The chosen approach will largely depend on the availability and willingness of your data provider to invest extra time in formatting and thoroughly describing their data. If you – as node manager – would prefer a partly user driven approach, the following steps to publishing marine data to OBIS briefly explains how you or a data provider can upload, standardize and publish a dataset data to the OBIS node IPT, without the hassle of installing and maintaining a program. The data are published in your organization’s name. This guide is based on the Canadensys 7-step guide to publishing marine data:
Desmet, P. & C. Sinou. 2012. 7-step guide to data publication. Canadensys. http://www.canadensys.net/data‐publication‐guide.
Make sure you have obtained the rights from the data owners to publish their data!
Create your resource on the IPT
The Integrated Publishing Toolkit (IPT), developed by GBIF, is an open source web application that can be customized by the OBIS node manager. The IPT-instance is used to publish and register all the [node name] datasets. To be able to create and manage your own dataset (called a “resource”), you will need a user account. Contact your node manager [include name + email] to create one for you.
Once you have your account, login at the top of the IPT page. Click on the tab Manage resources: it will display all the datasets you are managing and will be empty at first. You can create a new resource at the bottom of the page. Follow the GBIF IPT manual for more detailed instructions. The first thing that needs to be completed is the shortname of your resource. This shortname uniquely identifies your resource (=dataset) and will eventually show up in the URL of this resource on IPT. These shortname identifiers are also used to create folders on the IPT and they cannot be changed.
We therefore advise that the shortname:
- is unique, descriptive and short (max. 100 characters)
- does not contain any special characters (space, comma, accents, hyphen…)
As a guideline, the short name can include the acronym of the data providing institute or project, an indication of the content of the data (taxonomy – geography – time). It is recommended not to use spaces, but underscores instead. As each IPT will be node-specific, there is no need to include the OBIS node acronym into the short name. If you would have doubts about assigning a shortname, please contact your node manager [name + email].
Shortname good examples:
Shortname bad examples:
- Dataset 1
If you create a test-file, please include
_test at the end of your shortname.
The function “Optional archived resource to load” allows you to upload already created DarwinCore Archive files.
Once you have created your resource, you will see an empty resource overview page.
When you would delete a resource, please inform your node manager of this action!
Uploading your source file to the IPT is easy: go to > your resource overview page > Source Data and click on Choose File. You might want to compress/zip your source file first to improve the upload speed of large files. The IPT will unzip them automatically once received. Follow the IPT manual for more detailed instructions (including the option to use multiple source files or to upload via a direct database connection). Accepted formats are delimited text files (csv, tab and files using any other delimiter), either directly or compressed as zip or gzip.
Once your source file has been uploaded correctly, a source file detail page will be shown, displaying how the IPT has interpreted your file (number of columns, rows, header rows, character encoding, delimiters, etc.). Click the preview button to verify everything is correct, click anywhere on the screen to exit the preview, then click save.
Map your data to Darwin Core
Biodiversity data are published in the Darwin Core standard. It includes a list of defined terms and allows your data to be understood and used by others. It also allows an aggregator like OBIS or GBIF to integrate your data with other datasets.
Darwin Core mapping is the process of linking the fields in your resource file with the appropriate Darwin Core terms. It is the most challenging step in publishing your data for two reasons: 1) the list of Darwin Core terms can be overwhelming, so it might be difficult to select the ones that are appropriate for your dataset, and 2) the IPT currently only allows one-to-one mapping of fields, so the ease of mapping will depend on your database structure and on the feasibility of exporting as close to Darwin Core as possible. Contact your node manager or the OBIS secretariat at firstname.lastname@example.org to guide you through the steps, review your mapping, suggest terms etc.
You can find more information regarding Darwin Core mapping in the IPT manual (including core types, extensions, auto-mapping, default values, value translation, etc.) and in the introduction to Darwin Core.
Metadata enables users to discover, assess, understand and attribute your dataset for their particular needs, so it pays off to invest some time providing them.
Go to your resource overview page > Metadata and click Edit to open the metadata editor. Any information you provide here will be visible on the resource homepage and bundled together with your data when you publish.
Publish your data
Go to your resource overview page > Published Release and click Publish. The IPT will now generate your data as Darwin Core, combine it with the metadata and package it as a standardized zip-file called a “Darwin Core Archive”. See the IPT manual for more details. Hitting the “publish” button does not mean that your dataset is available to everyone, it is still hidden, with access limited to the resource managers.
Once the Darwin Core Archive is created, inform your node manager of this action, so the node manager can do the necessary quality control actions on this dataset. In order for your node manager to be able to look at the dataset, you will need to add him as a “resource manager” to this specific dataset.
Back on the resource overview page > Published Release, you can see the details of your first published dataset, including the publication date and the version number. Since your dataset is published privately, the only thing left to do is to click Visibility > Public (see the IPT manual) to make it available to everyone. Warning: please do not do this for your test dataset.
It is now listed on the IPT homepage and you can share and link to it via e.g.:
http://ipt.vliz.be/resource.do?r=kielbay70. This would be a good time to notify any regional or thematic network you are involved in, which can also have an interest in your dataset.
Your published dataset is a static snapshot of your data and will not change until you upload an updated source file and click publish again. This procedure has the advantage that your dataset is always available, does not require a live internet connection to your database and can be easily shared. It also allows you to control the publication process more precisely: version 1, version 2, etc. and users are informed of how recent the data are (via the last publication date).
include guidelines on how to publish a new version of a dataset
To view an older version of the metadata about the resource, just add the trailing parameter
&v=n to the URL where
v stands for “version”, and
n gets replaced by the version number, e.g.,
http://ipt.vliz.be/ilvo/resource.do?r=zoopl_bpns&v=1. In this way, specific versions of a resource’s EML, RTF, and DwC-A files can be retrieved. Please note, the IPT’s Archival Mode must be turned on in order for old versions of DwC-A to be stored (see Configure IPT settings section).
Publish your data as a dataset paper
The Metadata expressed in the EML Profile standard can also be downloaded as a Rich Text Format (RTF) file. The latter can serve as a draft manuscript describing the dataset (a “Data Paper”), which can be submitted for peer-review to e.g. a Pensoft journal.