mondo
mondo copied to clipboard
Add QC to ensure that if we provide evidence for a subset, the mapping must be exact
This QC check was created as a follow up to https://github.com/monarch-initiative/mondo/pull/7681
It ensures that, if a subset is declared for a term in ORDO the evidence for it (and ORDO code) must correspond to an exact mapping as well. So:
If
MONDO:123 subset: ordo_disease {source="Orphanet:123"}
There must also be an exact mapping to Orphanet:123
.
@matentzn I have not reviewed this since it the QC failed
I assigned this to you because the QC needs to be fixed by a curator! It fails because of the test..
@matentzn I am not sure I am understanding the query correctly. What I think it is checking is to make sure that for the ordo_disease
subset that the Orphanet CURIE that is listed in the source annotation is also used in an xref annotation.
If that is what is happening, then why is this line an error: Error: http://purl.obolibrary.org/obo/MONDO_0957397,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:652487 When mondo-edit.obo contains: id: MONDO:0957397 subset: ordo_disease {source="Orphanet:652487"} xref: Orphanet:652487 {xref="MONDO:equivalentTo"}
@matentzn I am not sure I am understanding the query correctly. What I think it is checking is to make sure that for the ordo_disease subset that the Orphanet CURIE that is listed in the source annotation is also used in an xref annotation.
My best guess:
This has already been fixed by some other PR? Else I also dont understand it.
The OBO snippet I posted was from mondo-edit.obo
in the branch for this PR, qc-ordo-subset-exact-mapping
.
Here is another error that I think does make sense to report as an error: Error: http://purl.obolibrary.org/obo/MONDO_0009349,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:268936 id: MONDO:0009349 subset: ordo_disease {source="Orphanet:268936"} xref: Orphanet:2162 {source="OMIM:236100"} --> Is the fix to add an xref to Orphanet:268936 based on some source TBD and add source="MONDO:equivalentTo"???
After the update of this branch with the latest mondo-edit.obo
there are 15 errors from this SPARQL query that need to be re-examined.
Here are the remaining 15 errors and the relevant mondo-edit.obo
snippet following merging master
into this branch earlier today. The general categories are:
- failures that look like they should have passed
- failures on obsolete terms that are in a subset, but do not have an Orphanet xref
- failures that look like failures, but unsure if the Orphanet in the
subset
needs to be removed OR if anxref
needs to be added for the Orphanet term - 1 failure http://purl.obolibrary.org/obo/MONDO_0060596,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:528084
id: MONDO:0060596) where the
xref
usesrelatedTo
Error: http://purl.obolibrary.org/obo/MONDO_0013626,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:247353
id: MONDO:0013626
name: psoriasis 14, pustular
subset: ordo_disease {source="Orphanet:404546", source="Orphanet:163931", source="Orphanet:247353"}
xref: Orphanet:163931 {source="MONDO:equivalentTo"}
xref: Orphanet:404546 {source="OMIM:614204", source="MONDO:equivalentTo"}
--> Is the fix to add an xref or remove source="Orphanet:247353" from the subset?
Error: http://purl.obolibrary.org/obo/MONDO_0014017,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:642675
id: MONDO:0014017
name: intellectual developmental disorder with autism and macrocephaly
subset: orphanet_rare {source="Orphanet:642675"}
xref: Orphanet:106 {source="OMIM:615032"}
xref: Orphanet:642675 {xref="MONDO:equivalentTo"}
Error: http://purl.obolibrary.org/obo/MONDO_0014498,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:576349
id: MONDO:0014498
xref: Orphanet:47045 {source="DOID:0090065"}
xref: Orphanet:576349 {xref="MONDO:equivalentTo"}
Error: http://purl.obolibrary.org/obo/MONDO_0016520,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:2345
id: MONDO:0016520
name: obsolete isolated Klippel-Feil syndrome
subset: ordo_disease {source="Orphanet:2345"}
--> No xrefs to Orphanet
Error: http://purl.obolibrary.org/obo/MONDO_0018347,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:397933
id: MONDO:0018347
name: obsolete severe intellectual disability-progressive postnatal microcephaly- midline stereotypic hand movements syndrome
subset: ordo_disease {source="Orphanet:397933"}
--> This only has an xref to GARD
Error: http://purl.obolibrary.org/obo/MONDO_0018888,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:53691
id: MONDO:0018888
name: obsolete congenital cornea plana
subset: ordo_disease {source="Orphanet:53691"}
--> No xrefs to Orphanet
Error: http://purl.obolibrary.org/obo/MONDO_0019482,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:86903
id: MONDO:0019482
name: obsolete dendritic cell sarcoma not otherwise specified
subset: ordo_disease {source="Orphanet:86903"}
--> No xrefs to Orphanet
Error: http://purl.obolibrary.org/obo/MONDO_0019486,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:86909
id: MONDO:0019486
name: obsolete myoclonic epilepsy of infancy
subset: ordo_disease {source="Orphanet:86909"}
--> This only has an xref to GARD
Error: http://purl.obolibrary.org/obo/MONDO_0020548,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:99922
id: MONDO:0020548
name: obsolete ocular pemphigoid
subset: ordo_disease {source="Orphanet:99922"}
--> This only has an xref to GARD
Error: http://purl.obolibrary.org/obo/MONDO_0031219,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:252202
id: MONDO:0031219
name: mismatch repair cancer syndrome
subset: ordo_disease {source="Orphanet:252202"}
xref: Orphanet:252202 {xref="MONDO:equivalentTo"}
Error: http://purl.obolibrary.org/obo/MONDO_0033479,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:631095
id: MONDO:0033479
name: spinocerebellar ataxia 44
subset: ordo_disease {source="Orphanet:631095"}
xref: Orphanet:631095 {xref="MONDO:equivalentTo"}
Error: http://purl.obolibrary.org/obo/MONDO_0033947,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:528647
id: MONDO:0033947
name: obsolete hereditary angioedema with normal C1Inh
subset: ordo_disease {source="Orphanet:528647"}
--> This only has an xref to GARD
Error: http://purl.obolibrary.org/obo/MONDO_0044067,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:636945
id: MONDO:0044067
name: candidiasis, invasive
subset: ordo_disease {source="Orphanet:636945"}
xref: Orphanet:636945 {xref="MONDO:equivalentTo"}
Error: http://purl.obolibrary.org/obo/MONDO_0060596,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:528084
id: MONDO:0060596
name: neurodevelopmental disorder with dysmorphic facies and distal limb anomalies
subset: ordo_disease {source="Orphanet:528084"}
xref: Orphanet:528084 {source="MONDO:relatedTo"}
--> Change to equivalentTo
Error: http://purl.obolibrary.org/obo/MONDO_0957397,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:652487
id: MONDO:0957397
name: intellectual developmental disorder, autosomal dominant 72
subset: ordo_disease {source="Orphanet:652487"}
xref: Orphanet:652487 {xref="MONDO:equivalentTo"}
Thanks!
I would suggest we continue this after:
- [x] dev is merged into main in mondo ingest
- [x] another data release was done in mondo ingest
- [ ] I update the ORDO subsets according to our recent changes
Some of the examples you found sound like real bugs in the query, but I cant pinpoint them right now.
That plan sounds good to me!
Why is this SPARQL query only for “ordo_disease” and not the other two subsets related to Orphanet?
No particular reason other than that this was an important use case - ideally we add all other subsets to this qc check as well. Maybe just remove the VALUES ..
clause? this will test all the subsets and their annotations!
Should obsolete terms be in an “ordo_disease” subset and also have an xref to Orphanet? See Sabrina’s comments: "If a term is obsolete in Mondo, it doesn't make sense (to me) that it is in a rare disease subset (it would be like saying "this term does not exist anymore, but it is in a subset")." https://github.com/monarch-initiative/mondo/pull/7681#issuecomment-2099070634
IMO: we should have a really, really good reason for any ORDO class in the ordo_disease
subset. Ideally this case should not exist. But in case there is a good one, then yes, it should be xrefed as well. Sabrinas problem should be solved in the way the subsets are constructed (not adding rare
subset to obsolete classes).
Are there any situations where a MONDO term would have an xref to Orphanet, but then that Orphanet ID not be a source for an Orphanet subset? Is this an issue with the SPARQL query?
Hmmmmm. Yeah I guess that is possible. For example when there are two Orphanet mappings (proxy merge) and only one of them is in the ordo_disorder subset. Good question!
Chatted with Sabrina and both "MONDO:obsoleteEquivalent"
and "MONDO:equivalentTo"
should be in the query. If there are still failures then we need to look at the failures and see what the issues are.
This now fails due to 1 proxy merge:
FAIL Rule ../sparql/qc/mondo/qc-proxy-merges.sparql: 2 violation(s)
entity,property,value
http://purl.obolibrary.org/obo/MONDO_0014269,Orphanet:397593,http://purl.obolibrary.org/obo/MONDO_0018337
http://purl.obolibrary.org/obo/MONDO_0018337,Orphanet:397593,http://purl.obolibrary.org/obo/MONDO_0014269
-
MONDO_0018337
is obsolete and in an "ordo_disorder" subset and has an xref toOrphanet:397593
with sourceMONDO:obsoleteEquivalent
. -
MONDO_0014269 is not in an "ordo_disorder" subset (but is a Disorder in Orphanet) and has an xref to
Orphanet:397593
with sourceMONDO:equivalentTo
. This equivalentTo statement is correct.
What's the best way to handle this?
MONDO_0018337
MONDO_0014269
Removed xref and subset on obsolete term and added subset to correct/active Mondo term.