Bridging axioms when 2 xrefs point to the same foreign term
We have quite a few cases where two different Uberon terms have a cross-reference to the same foreign term, as in this example:
[Term]
id: UBERON:0000977
name: pleura
xref: EMAPA:16775
...
[Term]
id: UBERON:0003390
name: mesothelium of pleural cavity
xref: EMAPA:16775
I am unsure about what to do with such cross-references when generating the bridges.
Under the current system (using @cmungall ’s make-bridge-ontologies-from-xrefs.pl script), this will result initially in two frames in the bridge file:
[Term]
id: EMAPA:16775
intersection_of: UBERON:0000977 ! pleura
intersection_of: part_of NCBITaxon:10090
[Term]
id: EMAPA:16775
intersection_of: UBERON:0003390 ! mesothelium of pleural cavity
intersection_of: part_of NCBITaxon:10090
Upon converting the OBO bridge file to OWL, the two frames will be merged, resulting in a single equivalence axiom between EMAPA:16775 and the intersection of UBERON:0000977, UBERON:0003390, and the existential restriction on part_of NCBITaxon:10090:
EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0000977 UBERON:0003390 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))
I am not sure this behaviour is correct or expected.
Currently, my SSSOM-based bridge generation process, in the same situation, would generate the two following equivalence axioms instead:
EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0000977 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))
EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0003390 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))
But I am not convinced this is the correct thing to do either.
From the cases I have seen, I am inclined to think that most if not all cases of "2 Uberon terms mapped to the same foreign term" are actually bogus, most likely the result of one editor adding a cross-reference on a Uberon term to a foreign term without realising that another Uberon term was already mapped to the same foreign term.
To generate the bridges, I am considering either:
- ignoring such cases entirely: do not generate any bridging axioms to a foreign term if there are more than strictly one cross-reference to it;
- ignoring the second cross-reference only: generate a bridging axiom to a foreign term upon encountering the first cross-reference to it, then ignore any following cross-reference to that same term.
In any case, a warning would be emitted (and possibly a report generated) so that editors would know about the conflicting cross-references so that they could fix them.
Thoughts?
agree with your analysis, both options are not ideal, but pick one for the short term. Long term let's just make a report and fix. I can help
On Tue, Sep 5, 2023 at 5:29 PM Damien Goutte-Gattat < @.***> wrote:
We have quite a few cases where two different Uberon terms have a cross-reference to the same foreign term, as in this example:
[Term] id: UBERON:0000977 name: pleura xref: EMAPA:16775
... [Term] id: UBERON:0003390 name: mesothelium of pleural cavity xref: EMAPA:16675
I am unsure about what to do with such cross-references when generating the bridges.
Under the current system (using @cmungall https://github.com/cmungall ’s make-bridge-ontologies-from-xrefs.pl script), this will result initially in two frames in the bridge file:
[Term] id: EMAPA:16775 intersection_of: UBERON:0000977 ! pleura intersection_of: part_of NCBITaxon:10090
[Term] id: EMAPA:16775 intersection_of: UBERON:0003390 ! mesothelium of pleural cavity intersection_of: part_of NCBITaxon:10090
Upon converting the OBO bridge file to OWL, the two frames will be merged, resulting in a single equivalence axiom between EMAPA:16775 and the intersection of UBERON:0000977, UBERON:0003390, and the existential restriction on part_of NCBITaxon:10090:
EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0000977 UBERON:0003390 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))
I am not sure this behaviour is correct or expected.
Currently, my SSSOM-based bridge generation process, in the same situation, would generate the two following equivalence axioms instead:
EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0000977 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090))) EquivalentClasses(EMAPA:16775 ObjectIntersectionOf(UBERON:0003390 ObjectSomeValuesFrom(BFO:0000050 NCBITaxon:10090)))
But I am not convinced this is the correct thing to do either.
From the cases I have seen, I am inclined to think that most if not all cases of "2 Uberon terms mapped to the same foreign term" are actually bogus, most likely the result of one editor adding a cross-reference on a Uberon term to a foreign term without realising that another Uberon term was already mapped to the same foreign term.
To generate the bridges, I am considering either:
- ignoring such cases entirely: do not generate any bridging axioms to a foreign term if there are more than strictly one cross-reference to it;
- ignoring the second cross-reference only: generate a bridging axiom to a foreign term upon encountering the first cross-reference to it, then ignore any following cross-reference to that same term.
In any case, a warning would be emitted (and possibly a report generated) so that editors would know about the conflicting cross-references so that they could fix them.
Thoughts?
— Reply to this email directly, view it on GitHub https://github.com/obophenotype/uberon/issues/3056, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOJQSHQFVQOGAXJXMWLXY67VNANCNFSM6AAAAAA4MQUVEE . You are receiving this because you were mentioned.Message ID: @.***>
There are 273 instances in total of foreign terms that are mapped to more than one Uberon (or CL) term, including some cases where a foreign term is mapped to both a Uberon term and a CL term, and cases where a foreign term is mapped to no less than four Uberon terms.
Complete list attached. xrefs.txt
EMAPA is by far the largest offender, with more than a hundred of cases involving cross-references to a EMAPA term.
Here is the list of duplicated cross-references, along with a proposed resolution whenever possible:
https://docs.google.com/spreadsheets/d/1tvi2UR5Sp6yjlLlRj6JkQfDO6Gd6AqwitBRd8FCxiL0/edit?usp=sharing
The “confidence” column means “how confident I am that the proposed resolution is the correct one”.
A large part of the duplicated cross-references concerns terms from deprecated ontologies (e.g. AAO, VHOG). I propose to simply remove them all. They can’t cause any issues when generating the bridges (since we don’t bridge to deprecated ontologies), but they do create noise in the duplicated xrefs report, making it harder to distinguish which duplicated cross-references we should try to fix.
As for the other cases, I’ll start by implementing the proposed solution for cases where I’m at least 80% confident in said solution.
Deprecated ontologies: Did not review. Happy to remove mappings
ZFA:
I agree with all your calls. An archaeological note here: Phenoscape created TAO by cloning ZFA and pseudogeneralizing all terms, so ZFA:0000347 -> TAO:0000347. Some time later we brought these into uberon, giving UBERON:2x IDs. So in general I would expect these to be trivially correct, although in some cases this process brought in dupes, in which case we should merge as you suggest.
The one oddity here is ZFA:0000347 which I shared your 0.5 confidence
XAO - agreed
SCTID - did not look
PBA - agreed
NCIT - agreed. Yes, I think the two anal glands terms should be merged
MA - agreed with all
Let's keep the kidney stuff as is, "kidney" is quite generic in uberon encompassing 3 structures (GO would probably have us include Malphighian tubules here...), and different vertebrates have different adult functioning kidneys at different stages... open to better ways of doing this but should be new issue. Your suggested xref change is good
HBA - agreed
FMA
OK this one is a bit more complex. In the "OBO version" of FMA which is an abomination I am responsible for, these terms exist:
*FMA FMA:265130 UBERON:0000065 Remove all xrefs 0.9 FMA term does not seem to exist? respiratory tract Respiratory tract
FMA FMA:265130 UBERON:0001005 Remove all xrefs 0.9 FMA term does not seem to exist? respiratory airway Respiratory tract
*FMA FMA:271599 UBERON:0009835 Remove all xrefs 0.9 FMA term does not seem to exist? anterior cingulate cortex Gray matter of anterior cingulate gyrus
FMA FMA:271599 UBERON:0022438 Remove all xrefs 0.9 FMA term does not seem to exist? rostral anterior cingulate cortex Gray matter of anterior cingulate gyrus
FMA FMA:272300 UBERON:0024000 Remove all xrefs 0.9 FMA term does not seem to exist? cerebellum hemispheric lobule IV Quadrangular lobule
*FMA FMA:272300 UBERON:0036063 Remove all xrefs 0.9 FMA term does not seem to exist? quadrangular lobule Quadrangular lobule
FMA FMA:293087 UBERON:0007690 Remove all xrefs 0.9 FMA term does not seem to exist? early pharyngeal endoderm Endoderm of pharyngeal arch
*FMA FMA:293087 UBERON:0009722 Remove all xrefs 0.9 FMA term does not seem to exist? entire pharyngeal arch endoderm Endoderm of pharyngeal arch
FMA FMA:293966 UBERON:0002546 Remove all xrefs 0.9 FMA term does not seem to exist? cranial placode Ectodermal placode
*FMA FMA:293966 UBERON:0010232 Remove all xrefs 0.9 FMA term does not seem to exist? placodal ectoderm Ectodermal placode
*FMA FMA:293971 UBERON:0003050 Remove all xrefs 0.9 FMA term does not seem to exist? olfactory placode Nasal placode
FMA FMA:293971 UBERON:0009292 Remove all xrefs 0.9 FMA term does not seem to exist? embryonic nasal process Nasal placode
*FMA FMA:321647 UBERON:0003268 Remove all xrefs 0.9 FMA term does not seem to exist? tooth of lower jaw Mandibular tooth
FMA FMA:321647 UBERON:0011594 Remove all xrefs 0.9 FMA term does not seem to exist? dentary tooth Mandibular tooth
See: https://github.com/OBOFoundry/OBOFoundry.github.io/issues/21
I suggest we keep the ones I marked with * for now
EMAPA - agreed
A lot of weird historic mappings date back to when EMAPA was only partly generalized from EMAP and there were many indistinguishable concepts with the same label...
neurocranium and chondrocranium - I remember looking into this a long time ago, can look deeper later
@cmungall Thank you for this fast review, much appreciated!
Regarding FMA: Wow, I had no idea of the complex history behind it. Reading through the linked issue, this raises the question: what to do about the UBERON/FMA bridge (uberon-bridge-to-fma)?
Currently, that bridge is using “OBO” PURLs (http://purl.obolibrary.org/obo/FMA_12345), so in effect, it’s a bridge to “FMA-OBO” (or “FMA-Lite”, or whatever we want to call it).
We could:
- Leave it like that. If
purl.obolibrary.orgis configured to automatically redirect to the “official” FMA PURLs (as per your July 31st suggestion), maybe this could be enough. - Switch the bridge to use the “official” FMA PURLs (
http://purl.org/sig/ont/fma/fma12345). - Maybe have two bridges? The existing one to “FMA-OBO” with the OBO PURLs, and a new one to the “official FMA” with FMA PURLs.
- Or, on the contrary, remove the bridge entirely? Given the status of FMA, it’s unclear to me that someone could really need to merge UBERON and FMA, regardless of whether we’re talking about “FMA-OBO” or “official FMA”. Of note, when building
composite-metazoanwe explicitly exclude FMA and its bridge, so even UBERON itself is not using the bridge.
(For the avoidance of doubt: option 4 is about removing the bridge, not the mappings. We would keep the cross-references to FMA in UBERON no matter what, so if someone needs to find the corresponding FMA term for a UBERON term, they could still do so. We would just not be providing the bridge that allows to merge the two ontologies together.)
Option 2 is easily and quickly doable with the new bridge pipeline, and option 3 should not be too difficult either.
At first I was strongly inclined to stick with 1 and then do 2 later.
However, there is a good argument for just going with 2 in anticipation of the OBO ticket finally being resolved.
Definitely not 4 - projects like HubMap are using FMA, and even if they use xrefs rather than bridge files (not 100% sure), I would guess that @dosumis would like to move the infrastructure towards using an ubergraph with FMA loaded, in which case 2 makes sense
On Wed, Nov 22, 2023 at 3:44 AM Damien Goutte-Gattat < @.***> wrote:
@cmungall https://github.com/cmungall Thank you for this fast review, much appreciated!
Regarding FMA: Wow, I had no idea of the complex history behind it. Reading through the linked issue, this raises the question: what to do about the UBERON/FMA bridge (uberon-bridge-to-fma)?
Currently, that bridge is using “OBO” PURLs ( http://purl.obolibrary.org/obo/FMA_12345), so in effect, it’s a bridge to “FMA-OBO” (or “FMA-Lite”, or whatever we want to call it).
We could:
- Leave it like that. If purl.obolibrary.org is configured to automatically redirect to the “official” FMA PURLs (as per your July 31st suggestion https://github.com/OBOFoundry/OBOFoundry.github.io/issues/21#issuecomment-1658633797), maybe this could be enough.
- Switch the bridge to use the “official” FMA PURLs ( http://purl.org/sig/ont/fma/fma12345).
- Maybe have two bridges? The existing one to “FMA-OBO” with the OBO PURLs, and a new one to the “official FMA” with FMA PURLs.
- Or, on the contrary, remove the bridge entirely? Given the status of FMA, it’s unclear to me that someone could really need to merge UBERON and FMA, regardless of whether we’re talking about “FMA-OBO” or “official FMA”. Of note, when building composite-metazoan we explicitly exclude FMA and its bridge, so even UBERON itself is not using the bridge.
(For the avoidance of doubt: option 4 is about removing the bridge, not the mappings. We would keep the cross-references to FMA in UBERON no matter what, so if someone needs to find the corresponding FMA term for a UBERON term, they could still do so. We would just not be providing the bridge that allows to merge the two ontologies together.)
Options 2 and 3 should be easily doable with the new bridge pipeline.
— Reply to this email directly, view it on GitHub https://github.com/obophenotype/uberon/issues/3056#issuecomment-1822616493, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGZRCFVELZNO4TMFZIWJO3YFXQRFAVCNFSM6AAAAAA4MQUVEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGYYTMNBZGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Can this be closed. Looks like linked PRs deal with the problem (?)
Indeed.