OBOFoundry.github.io
OBOFoundry.github.io copied to clipboard
NCIT and NCBITAXON ids
As part of our great OBO wide ID sweep, I would like to understand peoples positions about the whole NCIT/NCBITAXON OBO ID vs Bioportal ID question. We see basically two variants of these floating around the OBO sphere:
http://purl.bioontology.org/ontology/NCBITAXON/135663 vs http://purl.obolibrary.org/obo/NCBITaxon_135663
and http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C15958 vs http://purl.obolibrary.org/obo/NCIT_C15958
Obviously, that's not a nice position from an interoperability standpoint! What are peoples opinions about the matter?
See related ticket here: https://github.com/Superraptor/GSSO/issues/6
The NCIT non OBO ones do not even resolve.. so probably that's an easier discussion!
NCIT OBO Edition is the result of an NCI-funded project for greater OBO compatibility: https://github.com/NCI-Thesaurus/thesaurus-obo-edition
Here is some background: https://medium.com/@MonarchInit/tailoring-the-nci-thesaurus-for-semantic-interoperability-21305ccfe3a6
Discussion on the operations committee thread raised these points, many of which apply to many ontology resources in UMLS:
- NCBI didn't originally create their own resolvable PIDs
- Neither OBO nor BioPortal are the authority.
- Minting identifiers is needed not just for concept identification, but also to look up (by resolving the IRI) information about the concept
- Resolution is usually desired in the registry of the user's choice (which varies by person and their working environment)
- Good options (IMHO) include (A) NCBI provides an identifier scheme and resolver at that namespace. (B) Repositories agree on a common identifier mechanism and resolver service that all use for these non-compliant resources.
- Other options include (C) All repositories agree to use a single repository's namespace for these identifiers. (D) NCBI provides the identifier scheme and continues letting everyone else do the resolver part. (E) Repositories learn which resources don't have their own namespaces, and how to recognize them using all the namespaces that exist to refer to them.
It seems to me this is likely to be a repeating theme when reusing semantic resources, I'm encouraging Dublin Core (!) not serving some of its own controlled vocabularies using semantic standards. So we probably need to go 'up a level' to solve it in a persistent way.
John
additionally: we have requirements for an OWL rendering of an organism taxonomy that follows minimal OBO principles, e.g. use of subClassOf axioms. Most of the ontologies I work on critically depend on this for their functioning.
Some usage experience: We have been using OBO version of NCBITAXON IDs a lot, which has been working very well except that some small issues may occur sometimes. I have not used the Bioportal version of NCBITaxon IDs.