mondo icon indicating copy to clipboard operation
mondo copied to clipboard

Add QC to ensure that if we provide evidence for a subset, the mapping must be exact

Open matentzn opened this issue 9 months ago • 11 comments

This QC check was created as a follow up to https://github.com/monarch-initiative/mondo/pull/7681

It ensures that, if a subset is declared for a term in ORDO the evidence for it (and ORDO code) must correspond to an exact mapping as well. So:

If

MONDO:123 subset: ordo_disease {source="Orphanet:123"} 

There must also be an exact mapping to Orphanet:123.

matentzn avatar May 07 '24 11:05 matentzn

@matentzn I have not reviewed this since it the QC failed

twhetzel avatar May 30 '24 00:05 twhetzel

I assigned this to you because the QC needs to be fixed by a curator! It fails because of the test..

matentzn avatar May 30 '24 04:05 matentzn

@matentzn I am not sure I am understanding the query correctly. What I think it is checking is to make sure that for the ordo_disease subset that the Orphanet CURIE that is listed in the source annotation is also used in an xref annotation.

If that is what is happening, then why is this line an error: Error: http://purl.obolibrary.org/obo/MONDO_0957397,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:652487 When mondo-edit.obo contains: id: MONDO:0957397 subset: ordo_disease {source="Orphanet:652487"} xref: Orphanet:652487 {xref="MONDO:equivalentTo"}

twhetzel avatar Jun 03 '24 18:06 twhetzel

@matentzn I am not sure I am understanding the query correctly. What I think it is checking is to make sure that for the ordo_disease subset that the Orphanet CURIE that is listed in the source annotation is also used in an xref annotation.

My best guess:

This has already been fixed by some other PR? Else I also dont understand it.

matentzn avatar Jun 04 '24 11:06 matentzn

The OBO snippet I posted was from mondo-edit.obo in the branch for this PR, qc-ordo-subset-exact-mapping.

twhetzel avatar Jun 04 '24 15:06 twhetzel

Here is another error that I think does make sense to report as an error: Error: http://purl.obolibrary.org/obo/MONDO_0009349,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:268936 id: MONDO:0009349 subset: ordo_disease {source="Orphanet:268936"} xref: Orphanet:2162 {source="OMIM:236100"} --> Is the fix to add an xref to Orphanet:268936 based on some source TBD and add source="MONDO:equivalentTo"???

twhetzel avatar Jun 04 '24 16:06 twhetzel

After the update of this branch with the latest mondo-edit.obo there are 15 errors from this SPARQL query that need to be re-examined.

twhetzel avatar Jun 04 '24 16:06 twhetzel

Here are the remaining 15 errors and the relevant mondo-edit.obo snippet following merging master into this branch earlier today. The general categories are:

  • failures that look like they should have passed
  • failures on obsolete terms that are in a subset, but do not have an Orphanet xref
  • failures that look like failures, but unsure if the Orphanet in the subset needs to be removed OR if an xref needs to be added for the Orphanet term
  • 1 failure http://purl.obolibrary.org/obo/MONDO_0060596,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:528084 id: MONDO:0060596) where the xref uses relatedTo
Error: http://purl.obolibrary.org/obo/MONDO_0013626,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:247353
id: MONDO:0013626
name: psoriasis 14, pustular
subset: ordo_disease {source="Orphanet:404546", source="Orphanet:163931", source="Orphanet:247353"}
xref: Orphanet:163931 {source="MONDO:equivalentTo"}
xref: Orphanet:404546 {source="OMIM:614204", source="MONDO:equivalentTo"}
--> Is the fix to add an xref or remove source="Orphanet:247353" from the subset?


Error: http://purl.obolibrary.org/obo/MONDO_0014017,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:642675
id: MONDO:0014017
name: intellectual developmental disorder with autism and macrocephaly
subset: orphanet_rare {source="Orphanet:642675"}
xref: Orphanet:106 {source="OMIM:615032"}
xref: Orphanet:642675 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0014498,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:576349
id: MONDO:0014498
xref: Orphanet:47045 {source="DOID:0090065"}
xref: Orphanet:576349 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0016520,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:2345
id: MONDO:0016520
name: obsolete isolated Klippel-Feil syndrome
subset: ordo_disease {source="Orphanet:2345"}
--> No xrefs to Orphanet


Error: http://purl.obolibrary.org/obo/MONDO_0018347,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:397933
id: MONDO:0018347
name: obsolete severe intellectual disability-progressive postnatal microcephaly- midline stereotypic hand movements syndrome
subset: ordo_disease {source="Orphanet:397933"}
--> This only has an xref to GARD


Error: http://purl.obolibrary.org/obo/MONDO_0018888,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:53691
id: MONDO:0018888
name: obsolete congenital cornea plana
subset: ordo_disease {source="Orphanet:53691"}
--> No xrefs to Orphanet


Error: http://purl.obolibrary.org/obo/MONDO_0019482,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:86903
id: MONDO:0019482
name: obsolete dendritic cell sarcoma not otherwise specified
subset: ordo_disease {source="Orphanet:86903"}
--> No xrefs to Orphanet


Error: http://purl.obolibrary.org/obo/MONDO_0019486,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:86909
id: MONDO:0019486
name: obsolete myoclonic epilepsy of infancy
subset: ordo_disease {source="Orphanet:86909"}
--> This only has an xref to GARD


Error: http://purl.obolibrary.org/obo/MONDO_0020548,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:99922
id: MONDO:0020548
name: obsolete ocular pemphigoid
subset: ordo_disease {source="Orphanet:99922"}
--> This only has an xref to GARD


Error: http://purl.obolibrary.org/obo/MONDO_0031219,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:252202
id: MONDO:0031219
name: mismatch repair cancer syndrome
subset: ordo_disease {source="Orphanet:252202"}
xref: Orphanet:252202 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0033479,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:631095
id: MONDO:0033479
name: spinocerebellar ataxia 44
subset: ordo_disease {source="Orphanet:631095"}
xref: Orphanet:631095 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0033947,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:528647
id: MONDO:0033947
name: obsolete hereditary angioedema with normal C1Inh
subset: ordo_disease {source="Orphanet:528647"}
--> This only has an xref to GARD


Error: http://purl.obolibrary.org/obo/MONDO_0044067,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:636945
id: MONDO:0044067
name: candidiasis, invasive
subset: ordo_disease {source="Orphanet:636945"}
xref: Orphanet:636945 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0060596,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:528084
id: MONDO:0060596
name: neurodevelopmental disorder with dysmorphic facies and distal limb anomalies
subset: ordo_disease {source="Orphanet:528084"}
xref: Orphanet:528084 {source="MONDO:relatedTo"}
--> Change to equivalentTo


Error: http://purl.obolibrary.org/obo/MONDO_0957397,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:652487
id: MONDO:0957397
name: intellectual developmental disorder, autosomal dominant 72
subset: ordo_disease {source="Orphanet:652487"}
xref: Orphanet:652487 {xref="MONDO:equivalentTo"}

twhetzel avatar Jun 04 '24 21:06 twhetzel

Thanks!

I would suggest we continue this after:

  • [x] dev is merged into main in mondo ingest
  • [x] another data release was done in mondo ingest
  • [ ] I update the ORDO subsets according to our recent changes

Some of the examples you found sound like real bugs in the query, but I cant pinpoint them right now.

matentzn avatar Jun 05 '24 10:06 matentzn

That plan sounds good to me!

twhetzel avatar Jun 05 '24 15:06 twhetzel

Why is this SPARQL query only for “ordo_disease” and not the other two subsets related to Orphanet?

No particular reason other than that this was an important use case - ideally we add all other subsets to this qc check as well. Maybe just remove the VALUES .. clause? this will test all the subsets and their annotations!

Should obsolete terms be in an “ordo_disease” subset and also have an xref to Orphanet? See Sabrina’s comments: "If a term is obsolete in Mondo, it doesn't make sense (to me) that it is in a rare disease subset (it would be like saying "this term does not exist anymore, but it is in a subset")." https://github.com/monarch-initiative/mondo/pull/7681#issuecomment-2099070634

IMO: we should have a really, really good reason for any ORDO class in the ordo_disease subset. Ideally this case should not exist. But in case there is a good one, then yes, it should be xrefed as well. Sabrinas problem should be solved in the way the subsets are constructed (not adding rare subset to obsolete classes).

Are there any situations where a MONDO term would have an xref to Orphanet, but then that Orphanet ID not be a source for an Orphanet subset? Is this an issue with the SPARQL query?

Hmmmmm. Yeah I guess that is possible. For example when there are two Orphanet mappings (proxy merge) and only one of them is in the ordo_disorder subset. Good question!

matentzn avatar Jun 12 '24 19:06 matentzn

Chatted with Sabrina and both "MONDO:obsoleteEquivalent" and "MONDO:equivalentTo" should be in the query. If there are still failures then we need to look at the failures and see what the issues are.

twhetzel avatar Jul 29 '24 21:07 twhetzel

This now fails due to 1 proxy merge:

FAIL Rule ../sparql/qc/mondo/qc-proxy-merges.sparql: 2 violation(s)
entity,property,value
http://purl.obolibrary.org/obo/MONDO_0014269,Orphanet:397593,http://purl.obolibrary.org/obo/MONDO_0018337
http://purl.obolibrary.org/obo/MONDO_0018337,Orphanet:397593,http://purl.obolibrary.org/obo/MONDO_0014269
  • MONDO_0018337 is obsolete and in an "ordo_disorder" subset and has an xref to Orphanet:397593 with source MONDO:obsoleteEquivalent.

  • MONDO_0014269 is not in an "ordo_disorder" subset (but is a Disorder in Orphanet) and has an xref to Orphanet:397593 with source MONDO:equivalentTo. This equivalentTo statement is correct.

What's the best way to handle this?

MONDO_0018337 Screenshot 2024-07-30 at 12 39 09 PM


MONDO_0014269 Screenshot 2024-07-30 at 12 55 24 PM

twhetzel avatar Jul 30 '24 16:07 twhetzel

Removed xref and subset on obsolete term and added subset to correct/active Mondo term.

twhetzel avatar Jul 31 '24 03:07 twhetzel