Review cycles in mappings
There are some 260 cases of cycling which point mistakes in mappings. This is immediately important, as we are using mappings to create grouping classes in cases where there is a mapping, but no common grouping.
Here is an example cycle:
Warning: multiple UPHENO IDs found for group ZP:0004262 (growth decreased rate, abnormal), HP:0008897 (Postnatal growth retardation), MP:0001732 (postnatal growth retardation), UPHENO:0082952 (decreased asexual reproduction), HP:0001508 (Failure to thrive), WBPhenotype:0000031 (slow growth), PHIPO:0001272 (stunted growth), UPHENO:0005431 (decreased growth), DDPHENO:0000394 (decreased growth rate):
-------------------------
Phenotypes:
MP:0001732 (postnatal growth retardation)
PHIPO:0001272 (stunted growth)
UPHENO:0082952 (decreased asexual reproduction)
HP:0008897 (Postnatal growth retardation)
ZP:0004262 (growth decreased rate, abnormal)
WBPhenotype:0000031 (slow growth)
DDPHENO:0000394 (decreased growth rate)
HP:0001508 (Failure to thrive)
UPHENO:0005431 (decreased growth)
Mappings:
UPHENO:0005431 (decreased growth) <-> ZP:0004262 (growth decreased rate, abnormal)
HP:0008897 (Postnatal growth retardation) <-> MP:0001732 (postnatal growth retardation)
DDPHENO:0000394 (decreased growth rate) <-> UPHENO:0082952 (decreased asexual reproduction)
DDPHENO:0000394 (decreased growth rate) <-> HP:0008897 (Postnatal growth retardation)
MP:0001732 (postnatal growth retardation) <-> PHIPO:0001272 (stunted growth)
MP:0001732 (postnatal growth retardation) <-> UPHENO:0005431 (decreased growth)
DDPHENO:0000394 (decreased growth rate) <-> MP:0001732 (postnatal growth retardation)
UPHENO:0005431 (decreased growth) <-> WBPhenotype:0000031 (slow growth)
HP:0001508 (Failure to thrive) <-> UPHENO:0005431 (decreased growth)
So the combination of all mappings lead to two UPHENO ids being added the cycle - so at least one mapping is wrong.
For now, I will just randomly choose a UPHENO ID as parent in case I need one for a case that does not have one. But it would be really good if @ar-ibrahim you could prioritise this task a bit and report back to me mappings that are likely wrong. This is not a fast thing - this task will take many hours so make sure its discussed with Ray and James before embarking where it stands on the list of priorities..
Here is a log: mapping_cycles.txt
Of course, do not do everything manually - the goal should be to determine rules for wrong mappings that we can just implement computationally.
Anna and I took a look at this today, looking at this example Phenotypes: DDPHENO:0000171 (increased growth rate) UPHENO:0084229 (increased asexual reproduction) ZP:0131161 (growth increased rate, abnormal) UPHENO:0054968 (increased growth) MP:0002865 (increased growth rate) Mappings: MP:0002865 (increased growth rate) <-> UPHENO:0054968 (increased growth) DDPHENO:0000171 (increased growth rate) <-> MP:0002865 (increased growth rate) DDPHENO:0000171 (increased growth rate) <-> UPHENO:0084229 (increased asexual reproduction) UPHENO:0054968 (increased growth) <-> ZP:0131161 (growth increased rate, abnormal)
I'm assuming that the issue is that DDPHENO:0000171 maps to both UPHENO:0054968 (increased growth) [via MP:0002865] and UPHENO:0084229 (increased asexual reproduction)
We think the issue is that despite having the same label DDPHENO:0000171 ( defined at increased cell proliferation) is NOT equivalent to MP:0002865 (defined as time to reach a developmental stage or stages after birth). Lexical mapping is leading you badly astray here.
MP and ZP both use the GO term Growth (def: increase in size or mass of an entire organism or cell) in the EQs. DDPheno doesn't have an EQ but based on synonym I think it would be using GO asexual reproduction (https://amigo.geneontology.org/amigo/term/GO:0019954)
Warning: multiple UPHENO IDs found for group FBcv:0000390 (decreased rate of movement), UPHENO:0084424 (decreased rate of kinesthetic behavior), UPHENO:0005521 (decreased multicellular organismal movement), MP:0031392 (hypoactivity):
Phenotypes: UPHENO:0005521 (decreased multicellular organismal movement) UPHENO:0084424 (decreased rate of kinesthetic behavior) MP:0031392 (hypoactivity) FBcv:0000390 (decreased rate of movement) Mappings: FBcv:0000390 (decreased rate of movement) <-> UPHENO:0005521 (decreased multicellular organismal movement) FBcv:0000390 (decreased rate of movement) <-> MP:0031392 (hypoactivity) MP:0031392 (hypoactivity) <-> UPHENO:0084424 (decreased rate of kinesthetic behavior)
Issue here seems to be the relation between GO multicellular organismal movement GO:0050879 and NBO kinesthetic behavior NBO:0000338 kinesthetic behavior falls under GO behavior branch - definition 'Movement behavior of the body or its parts' multicellular organismal movement falls under multicellular organismal process - definition 'Any physiological process involved in changing the position of a multicellular organism or an anatomical part of a multicellular organism.' I'm not sure the GO definition is really correct as they don't list any of the process involved in movement under this term they just have terms for more specific types of movement. It seems likely that the GO and NBO terms are representing the same concept.
Warning: multiple UPHENO IDs found for group HP:0000104 (Renal agenesis), UPHENO:0008593 (absent kidney), MP:0000520 (absent kidney), XPO:0125089 (absent kidney), UPHENO:0026980 (absent kidney in the renal system), HP:0010958 (Bilateral renal agenesis):
Phenotypes: XPO:0125089 (absent kidney) UPHENO:0026980 (absent kidney in the renal system) MP:0000520 (absent kidney) HP:0010958 (Bilateral renal agenesis) HP:0000104 (Renal agenesis) UPHENO:0008593 (absent kidney) Mappings: MP:0000520 (absent kidney) <-> UPHENO:0026980 (absent kidney in the renal system) HP:0000104 (Renal agenesis) <-> MP:0000520 (absent kidney) MP:0000520 (absent kidney) <-> XPO:0125089 (absent kidney) HP:0000104 (Renal agenesis) <-> UPHENO:0008593 (absent kidney) UPHENO:0008593 (absent kidney) <-> XPO:0125089 (absent kidney) HP:0010958 (Bilateral renal agenesis) <-> MP:0000520 (absent kidney) HP:0000104 (Renal agenesis) <-> XPO:0125089 (absent kidney)
Couple of issues here
- MP absent kidney is exact match to HP:0010958 (Bilateral renal agenesis) - absence of BOTH kidneys
- HP:0000104 (Renal agenesis) is absence of one OR both kidneys - should really be under UPHENO decreased number of kidney
- I don't see why we have both UPHENO:0026980 (absent kidney in the renal system) and UPHENO:0008593 (absent kidney), we really should not have absent X in system Y. If X is absent and X is part of system Y then absent X will always be absent in the system. We would only want to use absent X in location Y for specific tissues not larger systems.
Warning: multiple UPHENO IDs found for group UPHENO:0081236 (skeleton of lower jaw hypoplasia), UPHENO:0069249 (decreased size of the mandible), UPHENO:0069414 (decreased size of the jaw skeleton), ZP:0003955 (ventral mandibular arch hypoplastic, abnormal), HP:0000347 (Micrognathia), UPHENO:0081314 (mandible hypoplasia), ZP:0001965 (mandibular arch skeleton decreased size, abnormal), MP:0004592 (small mandible), MP:0002639 (micrognathia), MP:0000460 (mandible hypoplasia):
Phenotypes: HP:0000347 (Micrognathia) MP:0000460 (mandible hypoplasia) UPHENO:0081236 (skeleton of lower jaw hypoplasia) UPHENO:0069414 (decreased size of the jaw skeleton) MP:0002639 (micrognathia) UPHENO:0069249 (decreased size of the mandible) UPHENO:0081314 (mandible hypoplasia) MP:0004592 (small mandible) ZP:0001965 (mandibular arch skeleton decreased size, abnormal) ZP:0003955 (ventral mandibular arch hypoplastic, abnormal) Mappings: HP:0000347 (Micrognathia) <-> MP:0004592 (small mandible) MP:0004592 (small mandible) <-> UPHENO:0069249 (decreased size of the mandible) HP:0000347 (Micrognathia) <-> MP:0000460 (mandible hypoplasia) MP:0000460 (mandible hypoplasia) <-> UPHENO:0081236 (skeleton of lower jaw hypoplasia) UPHENO:0069414 (decreased size of the jaw skeleton) <-> ZP:0001965 (mandibular arch skeleton decreased size, abnormal) MP:0002639 (micrognathia) <-> UPHENO:0069414 (decreased size of the jaw skeleton) HP:0000347 (Micrognathia) <-> MP:0002639 (micrognathia) UPHENO:0081236 (skeleton of lower jaw hypoplasia) <-> ZP:0003955 (ventral mandibular arch hypoplastic, abnormal) HP:0000347 (Micrognathia) <-> UPHENO:0081314 (mandible hypoplasia)
This is a mess.
MP has small mandible mandible hypoplasia short mandible micrognathia (child of abnormal jaw morphology, defined as abnormally reduced size of the jaws, especially of the mandible)
HP has micrognathia - defined as Developmental hypoplasia of the mandible.
The MP and HP terms are manually mapped in the MGI SSSOM file. Would be better to use the manual mappings in this case. It looks like the lexical stuff is messing us up.
ZP terms - mapping of these to UPHENO terms looks correct ZP:0001965 (mandibular arch skeleton decreased size, abnormal) - jaw in general ZP:0003955 (ventral mandibular arch hypoplastic, abnormal) - lower jaw specifically
Warning: multiple UPHENO IDs found for group MP:0003678 (absent ear lobes), HP:0000387 (Absent earlobe), UPHENO:0027604 (absent pinna in the lobule of pinna), UPHENO:0026224 (absent lobule of pinna in the ear):
Phenotypes: UPHENO:0026224 (absent lobule of pinna in the ear) MP:0003678 (absent ear lobes) HP:0000387 (Absent earlobe) UPHENO:0027604 (absent pinna in the lobule of pinna) Mappings: HP:0000387 (Absent earlobe) <-> UPHENO:0027604 (absent pinna in the lobule of pinna) MP:0003678 (absent ear lobes) <-> UPHENO:0026224 (absent lobule of pinna in the ear) HP:0000387 (Absent earlobe) <-> MP:0003678 (absent ear lobes)
UPHENO:0027604 (absent pinna in the lobule of pinna) - this makes no sense. The lobule of pinna (earlobe) is part of the pinna (outer ear) you can't have an absent pinna in the lobule of pinna. I think this comes form the HP eq which is has partbfosome(absentpatoandcharacteristic ofrosome(pinnauberonandpart ofrosomelobule of pinnauberon)andhas modifierrosomeabnormalpato)
One potential rule - If the second mapping comes from label to synonym or synonym to synonym then skip that mapping I think this would fix a number of the stenosis, morphology/phenotype and small/hypoplasia issues
example Warning: multiple UPHENO IDs found for group HP:0000413 (Atresia of the external auditory canal), MP:0009707 (absent external auditory canal), UPHENO:0009100 (absent external acoustic meatus), UPHENO:0063616 (external acoustic meatus atresia):
Phenotypes: UPHENO:0063616 (external acoustic meatus atresia) UPHENO:0009100 (absent external acoustic meatus) HP:0000413 (Atresia of the external auditory canal) MP:0009707 (absent external auditory canal) Mappings: HP:0000413 (Atresia of the external auditory canal) <-> MP:0009707 (absent external auditory canal) HP:0000413 (Atresia of the external auditory canal) <-> UPHENO:0063616 (external acoustic meatus atresia) MP:0009707 (absent external auditory canal) <-> UPHENO:0009100 (absent external acoustic meatus) I've fixed the synonym type in the MP to make atresia a related synonym to absent
Another example HP:0001093 (Optic nerve dysplasia) <-> MP:0001330 (abnormal optic nerve morphology) this seems to come from a mapping of HP:0001093 label to a narrow synonym of MP:0001330 this should imply that the HP term would be a descendant of the MP term if the two ontologies were merged. I think for the lexical mapping you should not use anything other than the exact synonyms
Warning: multiple UPHENO IDs found for group HP:0006342 (Peg-shaped maxillary lateral incisors), UPHENO:0041133 (conical calcareous tooth), MP:0030499 (conical tooth), UPHENO:0041270 (conical dentition), HP:0000698 (Conical tooth):
Phenotypes: HP:0006342 (Peg-shaped maxillary lateral incisors) UPHENO:0041133 (conical calcareous tooth) MP:0030499 (conical tooth) UPHENO:0041270 (conical dentition) HP:0000698 (Conical tooth) Mappings: HP:0000698 (Conical tooth) <-> UPHENO:0041270 (conical dentition) MP:0030499 (conical tooth) <-> UPHENO:0041133 (conical calcareous tooth) HP:0000698 (Conical tooth) <-> MP:0030499 (conical tooth) HP:0006342 (Peg-shaped maxillary lateral incisors) <-> MP:0030499 (conical tooth)
Two issues here
- MP & HP both have the term conical tooth but are using different Uberon terms in the eqs. I think HP needs to change in this case. The term is about specific teeth not the dentition as a whole.
- Where is HP:0006342 (Peg-shaped maxillary lateral incisors) <-> MP:0030499 (conical tooth) coming from?