## Biodiversity summaries

### Shannon index

Originally proposed by Claude Shannon to quantify the entropy (uncertainty or information content) in strings of text. The idea is that the more different letters there are, and the more equal their proportional abundances in the string of interest, the more difficult it is to correctly predict which letter will be the next one in the string. The Shannon entropy quantifies the uncertainty (entropy or degree of surprise) associated with this prediction. It is most often calculated as follows:

where is the proportion of characters belonging to the *i*th type of letter in the string of interest. In ecology, is often the proportion of individuals belonging to the *i*th species in the dataset of interest. Then the Shannon entroppy quantifies the uncertainty in predicting the species identity of an individual that is taken at random from the dataset. For more information see Wikipedia.

The Shannon diversity index, also known as the Shannon-Wiener diversity index, is defined in OBIS as the `sum`

over all species of `-fi*log(fi)`

with `fi`

defined as `n/ni`

with `n`

as the total number of records in the raster cell and `ni`

as the total number of records for the *ith*-species in the raster cell.

The Shannon index expresses the uncertainty associated with the prediction of the species the next sampled individual belongs to. It assumes that individuals are randomly sampled from an infinitely large community, and that all species are represented in the sample.

Warning: OBIS uses records as a proxy for individuals and sampling is generally not random, the community is not infinitely large and not all species are represented in the sample.

### ES50 (Hulbert index)

The expected number of marine species in a random sample of 50 individuals (records) is an indicator on marine biodiversity richness.

The ES50 is defined in OBIS as the `sum(esi)`

over all species of the following per species calculation:

- when
`n - ni >= 50`

(with`n`

as the total number of records in the cell and`ni`

the total number of records for the*ith*-species)`esi = 1 - exp(lngamma(n-ni+1) + lngamma(n-50+1) - lngamma(n-ni-50+1) - lngamma(n+1))`

- when
`n >= 50`

`esi = 1`

- else
`esi = NULL`

Warning: ES50 assumes that individuals are randomly distributed, the sample size is sufficiently large, the samples are taxonomically similar, and that all of the samples have been taken in the same manner.

### Simpson

The Simpson index was introduced in 1949 by Edward H. Simpson to measure the degree of concentration when individuals are classified into types.

The measure equals the probability that two entities taken at random from the dataset of interest represent the same type. It equals:

where is richness (the total number of types in the dataset) and is the proportional abundances of the types of interest.

Simpson’s index expresses the probability that any two individuals drawn at random from an infinitely large community belong to the same species. Note that small values are obtained in cells of high diversity and large values in cells of low diversity. This counterintuitive behavior is adressed with the Hill 2 number, which is the inverse of the Simpson index. Wikipedia

The Simpson biodiversity index is defined in OBIS as the `sum`

over all species of `(ni/n)^2`

with `n`

as the total number of records in the cell and `ni`

the total number of records for the *ith* species.

Warning: The Simpson index has the same assumptions as the Shannon index.

### Hill 1

The Hill biodiversity index accounts for species’ relative abundance (number of records in OBIS) and Hill1 can be roughly interpreted as the number of species with “typical” abundances, and is a commonly used indicator for marine biodiversity richness. It is defined as:

Warning: The Simpson index has the same assumptions as the Shannon index.

### Hill 2

The Hill biodiversity index accounts for species’ relative abundance (number of records in OBIS) and discounts rare species, so Hill2 can be interpreted as the equivalent to the number of more dominant species and so is less sensitive to sample size than Hill1. The Hill index is a commonly used indicator for marine biodiversity richness. It is defined as:

Warning: The Simpson index has the same assumptions as the Shannon index.

### Chao2

The Chao2 marine species richness estimator is a commonly used indicator for marine biodiversity richness (estimated total number of species). It is an estimator of minimum diversity, created for replicated incidence data. It uses the frequency of species occurring either once or twice in a sampling unit (e.g. a spatial grid cell) to estimate the number of undetected species. OBIS has calculated the Chao2 both for Biota (all species) and for Pisces (only fish).

Note that OBIS used ‘year’ as a proxy for sample and only takes species from samples with collection dates into account.

Formula that is calculated for each cell:

where

- is the number of species observed in a set of samples
- is the number of species reported in only one sampling year
- is the number of species reported in 2 sampling years
- is the total number of samples

When the estimator proves to be unreliable, it is set to null. The reliability of the estimator depends on the confidence intervals. When these intervals – or one of them – are infinite, the estimator cannot be considered as reliable. When the confidence interval is too large, the estimator will also be considered as unreliable. The following formula defines a minimum reliability criterion:

where is the upper boundary of the confidence interval, is the lower confidence interval boundary, and is the Chao2 estimator. Based on these confidence criteria, for certain areas no reliable Chao2 estimator could be calculated and will show up as (white) gaps on the map.

### Completeness (% of discovered species)

The completeness score is the proportion of marine species reported per grid cell in OBIS divided by the estimated number of species based on the Chao2 index. It provides a measure of sampling effort and supports the identification of geographical and taxonomical data gaps. OBIS has calculated the completeness score for both Biota (all species) and Pisces (only fish). It is calculated as

where is the total number of species from samples with collection dates and is the Chao2 estimator.

Note that OBIS used ‘year’ as a proxy for sample to calculate the Chao2.

### Red List species

This maps represents the most vulnerable areas based on the number of endangered (EN), critically endangered (CR) and vulnerable (VU) marine species occurring in each cell. The Red List categories are based on the IUCN Red List.

### Potentially extinct species

This maps represents the number of “pseudo-extinct” species that occurred in each cell, i.e. those species with more than 10 records in OBIS, but not recorded in OBIS anymore globally in the past 50 years.