checklistbank icon indicating copy to clipboard operation
checklistbank copied to clipboard

Missing fields in API from distribution extension

Open MortenHofft opened this issue 1 year ago • 6 comments

Via conversation with @LienReyserhove (please correct and add information as needed) The API response seems not to include all fields that was published. E.g. degreeOfEstablishment: established

See example here https://www.gbif.org/species/141117232/verbatim

API response is missing degreeOfEstablishment and a few other of the relatively new fields added to the distribution extension https://api.gbif.org/v1/species/141117232/distributions?

MortenHofft avatar May 03 '24 15:05 MortenHofft

Yes, we don't maintain the species API and never adapted to changing standards

mdoering avatar May 03 '24 16:05 mdoering

To give you more context (conclusion below):

The Global Register of Introduced and Invasive Species - Belgium (GRIIS Belgium), results from the aggregation of several thematic source checklists of alien species in Belgium (e.g. the Inventory of alien macroinvertebrates in Flanders, Belgium, the Manual of the Alien Plants of Belgium, etc.). Our workflow to generate GRIIS Belgium (summarized here):

  • Publish 12 thematic source checklist to GBIF
  • Retrieve taxa included in these checklists using the species API
  • Retrieve distribution, species profile and description information for these species using the species API
  • Unify taxa and related information into one unified checklist: GRIIS Belgium

Important Darwin Core fields to include in the source checklists and unified checklists are: eventDate, patwhay, degreeOfEstablishment and establishmentMeans. These terms are included in the Species Distribution Extension.

However, the terms pathway and degreeOfEstablishment were only recently added to the Species Distribution Extension. Before, information about the degree of establishment and pathway of introduction was included in the Taxon Description Extension (see this example for the Checklist of Alien herpetofauna in Belgium). When we started to update our source checklists using the new Darwin Core terms in the Species Distribution Extension, we removed the pathway and degree of establishment information from the Taxon Description extension and added them to the Species Distribution Extension. The INBO IPT was updated to include the publication of these new Darwin Core terms. However: as indicated, the species API does not include the new DwC fields. As a result: information about pathway and degree of establishment is lost from the Taxon Description and is not shown in the Species Distribution. More important: the information can't be retrieved and used for our unified checklist.

For example: the checklist of non-native freshwater fishes in Flanders is published including the new DwC fields pathway and degreeOfEstablishment in the Species Distribution extension. This information is missing in GBIF, see this record for Cyprinus carpio

Conclusion: If the species API is not updated with the new DwC terms degreeOfEstablishment and patwhay, this information is lost in our automated workflows. These DwC terms were specifically designed to allow these kinds of workflows and analyses further upstream, so it would be a missed chance not to include them in the species API

LienReyserhove avatar May 14 '24 13:05 LienReyserhove

The data is also indexed in checklistbank, but that distribution definition also does not include these or similar terms. You can access the information as verbatim records though:

https://www.checklistbank.org/dataset/30842/verbatim?term=dwc:degreeOfEstablishment https://api.checklistbank.org/dataset/30842/verbatim?term=dwc:degreeOfEstablishment

Maybe that can help? Not ideal I know, but following and blending different standards is a pain. What if we added the new terms to checklistbank?

mdoering avatar May 14 '24 14:05 mdoering

I have created an issue to support the entire dwc extension in checklistbank: https://github.com/CatalogueOfLife/coldp/issues/81 It is not trivial to embrace different standards for the same thing. Maybe it might be better to have a specific invasive distribution extension to not mix content with other distributions?

mdoering avatar May 17 '24 15:05 mdoering

Thanks @LienReyserhove

Since GBIF are in the midst of working out what to do with internal systems (the current database we're shortly going to phase out with the checklistbank system) would an option be to change the workflow to call an additional API?

You are presumably calling https://api.gbif.org/v1/species/141117232/distributions? Perhaps calling https://api.gbif.org/v1/species/141117232/verbatim would be an easy addition to the scripts? (it's still within the realm of the species API and consistent across datasets)

If not, we can look at the timelines for infrastructure change and discuss whether we should add them to the current system.

timrobertson100 avatar May 18 '24 06:05 timrobertson100

Thanks @mdoering and @timrobertson100 for this helpful information.

It appears that we can use the species verbatim API (https://api.gbif.org/v1/species/141117232/verbatim). It would require some changes in our code and make it slightly more complex, but for now this "hack" can still guarantee that our workflow remains intact with the new DwC terms, without implementing infrastructure changes at the GBIF side.

It would be good to know when are you planning the transition to checklistbank?

LienReyserhove avatar May 21 '24 09:05 LienReyserhove