bioregistry
bioregistry copied to clipboard
How to link raw data to prefix maps?
SSSOM allows embedding prefix maps into data to reduce the risk of uncoupled data from prefix maps. For example, we can embed prefix maps into our tsvs files like this:
# curie_map:
# FOODON: http://purl.obolibrary.org/obo/FOODON_
# KF_FOOD: https://kewl-foodie.inc/food/
# skos: http://www.w3.org/2004/02/skos/core#
# sssom: https://w3id.org/sssom/
# license: https://creativecommons.org/licenses/by/4.0/
subject_id subject_label predicate_id object_id object_label mapping_justification author_id object_source_version mapping_date confidence comment
KF_FOOD:F001 apple skos:exactMatch FOODON:00002473 apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl 2022-05-02 0.95 "We could map to FOODON:03310788 instead to cover sliced apples, but only ""whole"" apple types exist."
KF_FOOD:F002 gala skos:exactMatch FOODON:00003348 Gala apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl 2022-05-02 1.0
KF_FOOD:F003 pink skos:exactMatch FOODON:00004186 Pink apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl 2022-05-02 0.9 "We could map to FOODON:00004187 instead which more specifically refers to ""raw"" Pink apples. Decided against to be consistent with other mapping choices."
KF_FOOD:F004 braeburn skos:exactMatch sssom:NoMapping semapv:ManualMappingCuration orcid:0000-0002-7356-1779 http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl 2022-05-02 1.0
KF_FOOD:F004 braeburn skos:broadMatch FOODON:00002473 apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl 2022-05-02 1.0
In hindsight, choosing the term curie_map
was probably not ideal, prefix_map
would have been better.
In any case, maybe bioregistry should develop a recommendations for data providers to document their use of prefixes. It could be, that say, a link to a context like https://bioregistry.io/context/obo is sufficient, but it would go a long way to standardise this.
@matentzn maybe this is a good time to deprecate curie_map
and create a new field extended_prefix_map
that can either be 1. a URI to a JSON file or 2. a list of records
deprecate curie_map
I cant say I am not tempted but I fear that ship has sailed. This is such a fundamental part of the model that it will require a storm of people to convince me to really get rid of it.
That said, introducing a new field, "external_prefix_map" would be one way to deal with this issue.
To get the debate furthered - what exactly is the reason for allowing EPMs in this context? In my current view, this just complicates processing IMO. What is required here is a bimap, not an EPM.
my alternate idea to extend what's allowed in curie_map
is here: https://github.com/mapping-commons/sssom/issues/339
The suggestion is not unreasonable, but what is the use case of allowing EPMs at all in this context?