mondo icon indicating copy to clipboard operation
mondo copied to clipboard

WIP: First round QC for cleaning source metadata

Open matentzn opened this issue 2 years ago • 7 comments

Probably needs some discussions..

This adds checks for cases where we have

  • multiple conflicting source annotations
  • no source annotations

Here are the current failures: https://docs.google.com/spreadsheets/d/10wrrwp0ewtN30MwKjG6KBgV7oQMuLpkqAuL9Xs7WSvw/edit#gid=0

@nicolevasilevsky I think we need to discuss in a meeting how to best deal with this.

matentzn avatar Mar 03 '22 19:03 matentzn

@matentzn With the removal of the MONDO:subClassOf and MONDO:superClassOf, I think this is done, with a few exceptions:

  1. MONDO:cjm - should we convert this to Chris' ORCID? https://orcid.org/0000-0002-6601-2165. There are 1300 instances of this.
  2. MONDO:preferredExternal - this is allowed, could you revise the QC check to allow for this?

nicolevasilevsky avatar Apr 11 '22 23:04 nicolevasilevsky

Yeah, mention in next 1:1 - we have too many bulk edits happening at the moment, but lets change all these cjms and other attributions to orcids!

matentzn avatar Apr 12 '22 08:04 matentzn

Yeah, mention in next 1:1 - we have too many bulk edits happening at the moment, but lets change all these cjms and other attributions to orcids!

it is already on our agenda :)

nicolevasilevsky avatar Apr 12 '22 15:04 nicolevasilevsky

Blocked by https://github.com/monarch-initiative/monarch-mapping-commons/issues/10

Revisit 1st April

matentzn avatar Apr 18 '22 12:04 matentzn

@matentzn is going to regenerate this table. There are currently too many for manual review.

nicolevasilevsky avatar May 18 '22 15:05 nicolevasilevsky

In the meeting two separate issues were conflated: the absence of source annotations (which is what this issue is all about, as opposed to my misleading comments up top) and the presence of conflicting annotations, for which I created a new PR: #4943

This PR here is indeed still blocked by above.

matentzn avatar May 19 '22 08:05 matentzn

This PR checks for all of the source anotations on xrefs- we want to make sure each cross reference has some kind fo source annotation

some don't b/c we removed MONDO:superClassOf and MONDO:subClassOf

this can only be dealt with once we have the boomer mappings going back into Mondo

nicolevasilevsky avatar May 19 '22 15:05 nicolevasilevsky

@nicolevasilevsky next time we meet, we could finish this PR.

I wrote a method that takes care of more than 4600 violations: If there is an equivalent Xref to a term in Mondo, and another Mondo term has an xref to that term but no equivalent class, then we delete the latter:

MONDO:123 xref OMIM:123 { source="ORDO:123" } MONDO:987 xref OMIM:123 {source="MONDO:equivalentTo" }

In this case, we delete the former.

To finish:

  1. Spot check 10 random removals if they are correct (search for equivalent ones)
  2. Deal with the remaining 62 cases manually

matentzn avatar Feb 26 '23 20:02 matentzn

we don't want to remove the MONDO:includedEntryInOMIM annotations

nicolevasilevsky avatar Feb 28 '23 02:02 nicolevasilevsky

MONDO:includedEntryInOMIM

What does this mean?

matentzn avatar Feb 28 '23 09:02 matentzn

Re: MONDO:includedEntryInOMIM

see: https://github.com/monarch-initiative/mondo/issues/5507

nicolevasilevsky avatar Mar 06 '23 16:03 nicolevasilevsky

@matentzn can we merge this?

nicolevasilevsky avatar Mar 07 '23 04:03 nicolevasilevsky

Yes! Great job!

matentzn avatar Mar 07 '23 09:03 matentzn