biolink-model icon indicating copy to clipboard operation
biolink-model copied to clipboard

Add CHEBI:is_a as an exact mapping for biolink:subclass_of

Open vdancik opened this issue 1 year ago • 8 comments

Is your feature request related to a problem? Please describe. We are mapping ChEBI hierarchy to Biolink

Describe the solution you'd like Add CHEBI:is_a as an exact mapping for biolink:subclass_of

What working group (or team) did this request originate from? MolePro Note: This is relevant for members of NCATS Translator.

vdancik avatar Sep 30 '22 18:09 vdancik

Are you using a particular OBO format parser that creates this term? The issue is that there isn't really a term CHEBI:is_a. There is an is_a tag which is a built-in part of the OBO file format, not a particular ontology. The OBO spec defines this as creating an rdfs:subClassOf relationship (or rdfs:subPropertyOf if it is a relation). Maybe you could use rdfs:subClassOf at the point of ingest, which is already mapped to biolink:subclass_of.

balhoff avatar Oct 03 '22 15:10 balhoff

Jim is correct

On Mon, Oct 3, 2022 at 11:03 AM Jim Balhoff @.***> wrote:

Are you using a particular OBO format parser that creates this term? The issue is that there isn't really a term CHEBI:is_a. There is an is_a tag which is a built-in part of the OBO file format, not a particular ontology. The OBO spec https://owlcollab.github.io/oboformat/doc/obo-syntax.html defines this as creating an rdfs:subClassOf relationship (or rdfs:subPropertyOf if it is a relation). Maybe you could use rdfs:subClassOf at the point of ingest, which is already mapped to biolink:subclass_of.

— Reply to this email directly, view it on GitHub https://github.com/biolink/biolink-model/issues/1097#issuecomment-1265602897, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOIS4UCX7NLPL4OBKE3WBLYT3ANCNFSM6AAAAAAQ2AYVZI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

cmungall avatar Oct 03 '22 17:10 cmungall

As I understand it, MolePro does want to use biolink:subclass_of in their transformed KG, but they encounter chebi:is_a in their ingest. So they want a chebi:is_a mapping in biolink so their code can automatically know to pick biolink:subclass_of.

sierra-moxon avatar Oct 03 '22 17:10 sierra-moxon

Thanks @sierra-moxon ! This is a page from the ChEBI user manual regarding chebi:is_a https://docs.google.com/document/d/1_w-DwBdCCOh1gMeeP6yqGzcnkpbHYOa3AGSODe5epcg/edit


codewarrior2000 avatar Oct 03 '22 17:10 codewarrior2000

In that doc I only see is_a, not chebi:is_a. I think this is an issue to be dealt with in the code parsing the CHEBI OBO file.

balhoff avatar Oct 03 '22 17:10 balhoff

OK, so, CHEBI:is_substituent_group_from is not in the doc but yet it is listed as a narrow mapping for biolink:part_of https://biolink.github.io/biolink-model/docs/part_of.html#relation-part_of. What was the reason for that?

codewarrior2000 avatar Oct 03 '22 17:10 codewarrior2000

The uri is incorrect but chebi does indeed declare its own local object property for that

On Mon, Oct 3, 2022 at 1:44 PM Larry Chung @.***> wrote:

OK, so, CHEBI:is_substituent_group_from is not in the doc but yet it is listed as a narrow mapping for biolink:part_of https://biolink.github.io/biolink-model/docs/part_of.html#relation-part_of. What was the reason for that?

— Reply to this email directly, view it on GitHub https://github.com/biolink/biolink-model/issues/1097#issuecomment-1265811415, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOKF2W2BBQXBWGS5TMDWBMLQVANCNFSM6AAAAAAQ2AYVZI . You are receiving this because you commented.Message ID: @.***>

cmungall avatar Oct 03 '22 17:10 cmungall

In the file you will see usages like this:

[Term]
id: CHEBI:58957
name: carboxylatoacetyl group
subset: 3_STAR
def: "The substituent group formed from malonate(1-) ion." []
property_value: http://purl.obolibrary.org/obo/chebi/mass "86.04620" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/formula "C3H2O3" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/charge "-1" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/monoisotopicmass "86.00039" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/smiles "C(=O)([O-])CC(=O)*" xsd:string
is_a: CHEBI:27207
relationship: is_substituent_group_from CHEBI:30795
is_a: CHEBI:64775

Note how is_substituent_group_from is a value for the relationship tag. This is a property defined in the ontology, not a built-in part of the ontology language. But is_a is part of the file format. The is_substituent_group_from is defined lower down:

[Typedef]
id: is_substituent_group_from
name: is substituent group from
is_cyclic: false
is_transitive: false

You've uncovered another problem in that CHEBI isn't giving the relation a proper identifier, so if we do the expansion for CHEBI:is_substituent_group_from we won't get the same thing that the OBO to OWL mapping gives. :-(

I don't mean to be annoying about it; I just want to make sure we don't fill the Biolink file with IDs that only exist due to parsing issues, rather than actual mappings between different nomenclatures.

balhoff avatar Oct 03 '22 17:10 balhoff