uberon icon indicating copy to clipboard operation
uberon copied to clipboard

Bridging axioms when 2 xrefs point to the same foreign term

Open gouttegd opened this issue 2 years ago • 6 comments

We have quite a few cases where two different Uberon terms have a cross-reference to the same foreign term, as in this example:

[Term]
id: UBERON:0000977
name: pleura
xref: EMAPA:16775

...
[Term]
id: UBERON:0003390
name: mesothelium of pleural cavity
xref: EMAPA:16775

I am unsure about what to do with such cross-references when generating the bridges.

Under the current system (using @cmungall ’s make-bridge-ontologies-from-xrefs.pl script), this will result initially in two frames in the bridge file:

[Term]
id: EMAPA:16775
intersection_of: UBERON:0000977 ! pleura
intersection_of: part_of NCBITaxon:10090


[Term]
id: EMAPA:16775
intersection_of: UBERON:0003390 ! mesothelium of pleural cavity
intersection_of: part_of NCBITaxon:10090

Upon converting the OBO bridge file to OWL, the two frames will be merged, resulting in a single equivalence axiom between EMAPA:16775 and the intersection of UBERON:0000977, UBERON:0003390, and the existential restriction on part_of NCBITaxon:10090:

EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0000977 UBERON:0003390 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))

I am not sure this behaviour is correct or expected.

Currently, my SSSOM-based bridge generation process, in the same situation, would generate the two following equivalence axioms instead:

EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0000977 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))
EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0003390 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))

But I am not convinced this is the correct thing to do either.

From the cases I have seen, I am inclined to think that most if not all cases of "2 Uberon terms mapped to the same foreign term" are actually bogus, most likely the result of one editor adding a cross-reference on a Uberon term to a foreign term without realising that another Uberon term was already mapped to the same foreign term.

To generate the bridges, I am considering either:

  • ignoring such cases entirely: do not generate any bridging axioms to a foreign term if there are more than strictly one cross-reference to it;
  • ignoring the second cross-reference only: generate a bridging axiom to a foreign term upon encountering the first cross-reference to it, then ignore any following cross-reference to that same term.

In any case, a warning would be emitted (and possibly a report generated) so that editors would know about the conflicting cross-references so that they could fix them.

Thoughts?

gouttegd avatar Sep 06 '23 00:09 gouttegd

agree with your analysis, both options are not ideal, but pick one for the short term. Long term let's just make a report and fix. I can help

On Tue, Sep 5, 2023 at 5:29 PM Damien Goutte-Gattat < @.***> wrote:

We have quite a few cases where two different Uberon terms have a cross-reference to the same foreign term, as in this example:

[Term] id: UBERON:0000977 name: pleura xref: EMAPA:16775

... [Term] id: UBERON:0003390 name: mesothelium of pleural cavity xref: EMAPA:16675

I am unsure about what to do with such cross-references when generating the bridges.

Under the current system (using @cmungall https://github.com/cmungall ’s make-bridge-ontologies-from-xrefs.pl script), this will result initially in two frames in the bridge file:

[Term] id: EMAPA:16775 intersection_of: UBERON:0000977 ! pleura intersection_of: part_of NCBITaxon:10090

[Term] id: EMAPA:16775 intersection_of: UBERON:0003390 ! mesothelium of pleural cavity intersection_of: part_of NCBITaxon:10090

Upon converting the OBO bridge file to OWL, the two frames will be merged, resulting in a single equivalence axiom between EMAPA:16775 and the intersection of UBERON:0000977, UBERON:0003390, and the existential restriction on part_of NCBITaxon:10090:

EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0000977 UBERON:0003390 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))

I am not sure this behaviour is correct or expected.

Currently, my SSSOM-based bridge generation process, in the same situation, would generate the two following equivalence axioms instead:

EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0000977 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090))) EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0003390 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))

But I am not convinced this is the correct thing to do either.

From the cases I have seen, I am inclined to think that most if not all cases of "2 Uberon terms mapped to the same foreign term" are actually bogus, most likely the result of one editor adding a cross-reference on a Uberon term to a foreign term without realising that another Uberon term was already mapped to the same foreign term.

To generate the bridges, I am considering either:

  • ignoring such cases entirely: do not generate any bridging axioms to a foreign term if there are more than strictly one cross-reference to it;
  • ignoring the second cross-reference only: generate a bridging axiom to a foreign term upon encountering the first cross-reference to it, then ignore any following cross-reference to that same term.

In any case, a warning would be emitted (and possibly a report generated) so that editors would know about the conflicting cross-references so that they could fix them.

Thoughts?

— Reply to this email directly, view it on GitHub https://github.com/obophenotype/uberon/issues/3056, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOJQSHQFVQOGAXJXMWLXY67VNANCNFSM6AAAAAA4MQUVEE . You are receiving this because you were mentioned.Message ID: @.***>

cmungall avatar Sep 06 '23 01:09 cmungall

There are 273 instances in total of foreign terms that are mapped to more than one Uberon (or CL) term, including some cases where a foreign term is mapped to both a Uberon term and a CL term, and cases where a foreign term is mapped to no less than four Uberon terms.

Complete list attached. xrefs.txt

EMAPA is by far the largest offender, with more than a hundred of cases involving cross-references to a EMAPA term.

gouttegd avatar Sep 06 '23 02:09 gouttegd

Here is the list of duplicated cross-references, along with a proposed resolution whenever possible:

https://docs.google.com/spreadsheets/d/1tvi2UR5Sp6yjlLlRj6JkQfDO6Gd6AqwitBRd8FCxiL0/edit?usp=sharing

The “confidence” column means “how confident I am that the proposed resolution is the correct one”.

A large part of the duplicated cross-references concerns terms from deprecated ontologies (e.g. AAO, VHOG). I propose to simply remove them all. They can’t cause any issues when generating the bridges (since we don’t bridge to deprecated ontologies), but they do create noise in the duplicated xrefs report, making it harder to distinguish which duplicated cross-references we should try to fix.

As for the other cases, I’ll start by implementing the proposed solution for cases where I’m at least 80% confident in said solution.

gouttegd avatar Nov 21 '23 17:11 gouttegd

Deprecated ontologies: Did not review. Happy to remove mappings

ZFA:

I agree with all your calls. An archaeological note here: Phenoscape created TAO by cloning ZFA and pseudogeneralizing all terms, so ZFA:0000347 -> TAO:0000347. Some time later we brought these into uberon, giving UBERON:2x IDs. So in general I would expect these to be trivially correct, although in some cases this process brought in dupes, in which case we should merge as you suggest.

The one oddity here is ZFA:0000347 which I shared your 0.5 confidence

XAO - agreed

SCTID - did not look

PBA - agreed

NCIT - agreed. Yes, I think the two anal glands terms should be merged

MA - agreed with all

Let's keep the kidney stuff as is, "kidney" is quite generic in uberon encompassing 3 structures (GO would probably have us include Malphighian tubules here...), and different vertebrates have different adult functioning kidneys at different stages... open to better ways of doing this but should be new issue. Your suggested xref change is good

HBA - agreed

FMA

OK this one is a bit more complex. In the "OBO version" of FMA which is an abomination I am responsible for, these terms exist:

*FMA     FMA:265130      UBERON:0000065  Remove all xrefs        0.9     FMA term does not seem to exist?        respiratory tract       Respiratory tract
FMA     FMA:265130      UBERON:0001005  Remove all xrefs        0.9     FMA term does not seem to exist?        respiratory airway      Respiratory tract
*FMA     FMA:271599      UBERON:0009835  Remove all xrefs        0.9     FMA term does not seem to exist?        anterior cingulate cortex       Gray matter of anterior cingulate gyrus
FMA     FMA:271599      UBERON:0022438  Remove all xrefs        0.9     FMA term does not seem to exist?        rostral anterior cingulate cortex       Gray matter of anterior cingulate gyrus
FMA     FMA:272300      UBERON:0024000  Remove all xrefs        0.9     FMA term does not seem to exist?        cerebellum hemispheric lobule IV        Quadrangular lobule
*FMA     FMA:272300      UBERON:0036063  Remove all xrefs        0.9     FMA term does not seem to exist?        quadrangular lobule     Quadrangular lobule
FMA     FMA:293087      UBERON:0007690  Remove all xrefs        0.9     FMA term does not seem to exist?        early pharyngeal endoderm       Endoderm of pharyngeal arch
*FMA     FMA:293087      UBERON:0009722  Remove all xrefs        0.9     FMA term does not seem to exist?        entire pharyngeal arch endoderm Endoderm of pharyngeal arch
FMA     FMA:293966      UBERON:0002546  Remove all xrefs        0.9     FMA term does not seem to exist?        cranial placode Ectodermal placode
*FMA     FMA:293966      UBERON:0010232  Remove all xrefs        0.9     FMA term does not seem to exist?        placodal ectoderm       Ectodermal placode
*FMA     FMA:293971      UBERON:0003050  Remove all xrefs        0.9     FMA term does not seem to exist?        olfactory placode       Nasal placode
FMA     FMA:293971      UBERON:0009292  Remove all xrefs        0.9     FMA term does not seem to exist?        embryonic nasal process Nasal placode
*FMA     FMA:321647      UBERON:0003268  Remove all xrefs        0.9     FMA term does not seem to exist?        tooth of lower jaw      Mandibular tooth
FMA     FMA:321647      UBERON:0011594  Remove all xrefs        0.9     FMA term does not seem to exist?        dentary tooth   Mandibular tooth

See: https://github.com/OBOFoundry/OBOFoundry.github.io/issues/21

I suggest we keep the ones I marked with * for now

EMAPA - agreed

A lot of weird historic mappings date back to when EMAPA was only partly generalized from EMAP and there were many indistinguishable concepts with the same label...

neurocranium and chondrocranium - I remember looking into this a long time ago, can look deeper later

cmungall avatar Nov 22 '23 00:11 cmungall

@cmungall Thank you for this fast review, much appreciated!

Regarding FMA: Wow, I had no idea of the complex history behind it. Reading through the linked issue, this raises the question: what to do about the UBERON/FMA bridge (uberon-bridge-to-fma)?

Currently, that bridge is using “OBO” PURLs (http://purl.obolibrary.org/obo/FMA_12345), so in effect, it’s a bridge to “FMA-OBO” (or “FMA-Lite”, or whatever we want to call it).

We could:

  1. Leave it like that. If purl.obolibrary.org is configured to automatically redirect to the “official” FMA PURLs (as per your July 31st suggestion), maybe this could be enough.
  2. Switch the bridge to use the “official” FMA PURLs (http://purl.org/sig/ont/fma/fma12345).
  3. Maybe have two bridges? The existing one to “FMA-OBO” with the OBO PURLs, and a new one to the “official FMA” with FMA PURLs.
  4. Or, on the contrary, remove the bridge entirely? Given the status of FMA, it’s unclear to me that someone could really need to merge UBERON and FMA, regardless of whether we’re talking about “FMA-OBO” or “official FMA”. Of note, when building composite-metazoan we explicitly exclude FMA and its bridge, so even UBERON itself is not using the bridge.

(For the avoidance of doubt: option 4 is about removing the bridge, not the mappings. We would keep the cross-references to FMA in UBERON no matter what, so if someone needs to find the corresponding FMA term for a UBERON term, they could still do so. We would just not be providing the bridge that allows to merge the two ontologies together.)

Option 2 is easily and quickly doable with the new bridge pipeline, and option 3 should not be too difficult either.

gouttegd avatar Nov 22 '23 11:11 gouttegd

At first I was strongly inclined to stick with 1 and then do 2 later.

However, there is a good argument for just going with 2 in anticipation of the OBO ticket finally being resolved.

Definitely not 4 - projects like HubMap are using FMA, and even if they use xrefs rather than bridge files (not 100% sure), I would guess that @dosumis would like to move the infrastructure towards using an ubergraph with FMA loaded, in which case 2 makes sense

On Wed, Nov 22, 2023 at 3:44 AM Damien Goutte-Gattat < @.***> wrote:

@cmungall https://github.com/cmungall Thank you for this fast review, much appreciated!

Regarding FMA: Wow, I had no idea of the complex history behind it. Reading through the linked issue, this raises the question: what to do about the UBERON/FMA bridge (uberon-bridge-to-fma)?

Currently, that bridge is using “OBO” PURLs ( http://purl.obolibrary.org/obo/FMA_12345), so in effect, it’s a bridge to “FMA-OBO” (or “FMA-Lite”, or whatever we want to call it).

We could:

  1. Leave it like that. If purl.obolibrary.org is configured to automatically redirect to the “official” FMA PURLs (as per your July 31st suggestion https://github.com/OBOFoundry/OBOFoundry.github.io/issues/21#issuecomment-1658633797), maybe this could be enough.
  2. Switch the bridge to use the “official” FMA PURLs ( http://purl.org/sig/ont/fma/fma12345).
  3. Maybe have two bridges? The existing one to “FMA-OBO” with the OBO PURLs, and a new one to the “official FMA” with FMA PURLs.
  4. Or, on the contrary, remove the bridge entirely? Given the status of FMA, it’s unclear to me that someone could really need to merge UBERON and FMA, regardless of whether we’re talking about “FMA-OBO” or “official FMA”. Of note, when building composite-metazoan we explicitly exclude FMA and its bridge, so even UBERON itself is not using the bridge.

(For the avoidance of doubt: option 4 is about removing the bridge, not the mappings. We would keep the cross-references to FMA in UBERON no matter what, so if someone needs to find the corresponding FMA term for a UBERON term, they could still do so. We would just not be providing the bridge that allows to merge the two ontologies together.)

Options 2 and 3 should be easily doable with the new bridge pipeline.

— Reply to this email directly, view it on GitHub https://github.com/obophenotype/uberon/issues/3056#issuecomment-1822616493, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGZRCFVELZNO4TMFZIWJO3YFXQRFAVCNFSM6AAAAAA4MQUVEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGYYTMNBZGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

uberon avatar Nov 22 '23 18:11 uberon

Can this be closed. Looks like linked PRs deal with the problem (?)

dosumis avatar Apr 15 '24 13:04 dosumis

Indeed.

gouttegd avatar Apr 15 '24 14:04 gouttegd