bioregistry icon indicating copy to clipboard operation
bioregistry copied to clipboard

How to link raw data to prefix maps?

Open matentzn opened this issue 1 year ago • 4 comments

SSSOM allows embedding prefix maps into data to reduce the risk of uncoupled data from prefix maps. For example, we can embed prefix maps into our tsvs files like this:

# curie_map:
#   FOODON: http://purl.obolibrary.org/obo/FOODON_
#   KF_FOOD: https://kewl-foodie.inc/food/
#   skos: http://www.w3.org/2004/02/skos/core#
#   sssom: https://w3id.org/sssom/
# license: https://creativecommons.org/licenses/by/4.0/
subject_id  subject_label   predicate_id    object_id   object_label    mapping_justification   author_id   object_source_version   mapping_date    confidence  comment
KF_FOOD:F001    apple   skos:exactMatch FOODON:00002473 apple (whole)   semapv:ManualMappingCuration    orcid:0000-0002-7356-1779   http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl    2022-05-02  0.95    "We could map to FOODON:03310788 instead to cover sliced apples, but only ""whole"" apple types exist."
KF_FOOD:F002    gala    skos:exactMatch FOODON:00003348 Gala apple (whole)  semapv:ManualMappingCuration    orcid:0000-0002-7356-1779   http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl    2022-05-02  1.0 
KF_FOOD:F003    pink    skos:exactMatch FOODON:00004186 Pink apple (whole)  semapv:ManualMappingCuration    orcid:0000-0002-7356-1779   http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl    2022-05-02  0.9 "We could map to FOODON:00004187 instead which more specifically refers to ""raw"" Pink apples. Decided against to be consistent with other mapping choices."
KF_FOOD:F004    braeburn    skos:exactMatch sssom:NoMapping     semapv:ManualMappingCuration    orcid:0000-0002-7356-1779   http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl    2022-05-02  1.0 
KF_FOOD:F004    braeburn    skos:broadMatch FOODON:00002473 apple (whole)   semapv:ManualMappingCuration    orcid:0000-0002-7356-1779   http://purl.obolibrary.org/obo/foodon/releases/2022-02-01/foodon.owl    2022-05-02  1.0 

In hindsight, choosing the term curie_map was probably not ideal, prefix_map would have been better.

In any case, maybe bioregistry should develop a recommendations for data providers to document their use of prefixes. It could be, that say, a link to a context like https://bioregistry.io/context/obo is sufficient, but it would go a long way to standardise this.

matentzn avatar Dec 05 '22 18:12 matentzn

@matentzn maybe this is a good time to deprecate curie_map and create a new field extended_prefix_map that can either be 1. a URI to a JSON file or 2. a list of records

cthoyt avatar Nov 27 '23 10:11 cthoyt

deprecate curie_map

I cant say I am not tempted but I fear that ship has sailed. This is such a fundamental part of the model that it will require a storm of people to convince me to really get rid of it.

That said, introducing a new field, "external_prefix_map" would be one way to deal with this issue.

To get the debate furthered - what exactly is the reason for allowing EPMs in this context? In my current view, this just complicates processing IMO. What is required here is a bimap, not an EPM.

matentzn avatar Nov 29 '23 10:11 matentzn

my alternate idea to extend what's allowed in curie_map is here: https://github.com/mapping-commons/sssom/issues/339

cthoyt avatar Nov 29 '23 10:11 cthoyt

The suggestion is not unreasonable, but what is the use case of allowing EPMs at all in this context?

matentzn avatar Nov 29 '23 12:11 matentzn