incorrect subclass-of edge in KG2.9.0pre
This issue was brought to our attention by Kaiwen He of Team Unsecret Agent.
"UMLS:C2267095" ("M [Preparations]") -"biolink:subclass_of"-> "UMLS:C0025598"
("metformin")
"UMLS:C0025424" ("Mercury") -"biolink:subclass_of"-> "UMLS:C2267095" ("M [Preparations]")
This looks to be coming from UMLS MEDRT:
{"domain_range_exclusion": false, "id": "UMLS:C2267095---UMLS:has_parent---None---None---None---UMLS:C0025598---umls_source:MED-RT", "negated": false, "object": "UMLS:C0025598", "predicate": "biolink:subclass_of", "predicate_label": "has_parent", "primary_knowledge_source": "infores:medrt-umls", "publications": [], "publications_info": {}, "qualified_object_aspect": null, "qualified_object_direction": null, "qualified_predicate": null, "relation_label": "has_parent", "source_predicate": "UMLS:has_parent", "subject": "UMLS:C2267095", "update_date": "2023"}
{"domain_range_exclusion": false, "id": "UMLS:C2267095---UMLS:parent_of---None---None---None---UMLS:C0025424---umls_source:MED-RT", "negated": false, "object": "UMLS:C2267095", "predicate": "biolink:subclass_of", "predicate_label": "parent_of", "primary_knowledge_source": "infores:medrt-umls", "publications": [], "publications_info": {}, "qualified_object_aspect": null, "qualified_object_direction": null, "qualified_predicate": null, "relation_label": "INVERTED:parent_of", "source_predicate": "UMLS:parent_of", "subject": "UMLS:C0025424", "update_date": "2023"}
this bug seems to be an issue in UMLS source MED-RT, which is providing edges with
relation_label of has_parent and direction of N, and with the drug class and the drug swapped compared to what they should be; I think MED-RT should be using a Y of the relation is really has_parent; though the relation label is CHD so it may well be that they should be using relation label parent_of and keeping the direction N.
In any event, although I'd prefer to patch this in the extract script, in the interest of time, I'm putting a patch in the umls_util.py module
@ecwood can take a look when they are back in Summer 2025
"relations": {
"MED-RT": {
"CHD,has_parent,N": [
"C0025598"
],
note, C0025598 is metformin, the drug (not the drug class)
Note, I have not actually tested this code yet. Need to do that.
I tested this on kg2102build.rtx.ai in /home/ubuntu/issue-377 and verified that it works. With the above commit, the new block of JSON for the edge is:
{
"domain_range_exclusion": false,
"id": "UMLS:C0025598---UMLS:has_parent---None---None---None---UMLS:C2267095---umls_source:MED-RT",
"negated": false,
"object": "UMLS:C2267095",
"predicate": null,
"primary_knowledge_source": "umls_source:MED-RT",
"publications": [],
"publications_info": {},
"qualified_object_aspect": null,
"qualified_object_direction": null,
"qualified_predicate": null,
"relation_label": "has_parent",
"source_predicate": "UMLS:has_parent",
"subject": "UMLS:C0025598",
"update_date": "2023"
}
On kg2102build.rtx.ai, the old file (/home/ubuntu/kg2-build/kg2-umls-edges.jsonl) has:
grep 'MED-RT' /home/ubuntu/kg2-build/kg2-umls-edges.jsonl | grep has_parent | grep C0025598 | jq . | less
{
"domain_range_exclusion": false,
"id": "UMLS:C2267095---UMLS:has_parent---None---None---None---UMLS:C0025598---umls_source:MED-RT
",
"negated": false,
"object": "UMLS:C0025598",
"predicate": null,
"primary_knowledge_source": "umls_source:MED-RT",
"publications": [],
"publications_info": {},
"qualified_object_aspect": null,
"qualified_object_direction": null,
"qualified_predicate": null,
"relation_label": "has_parent",
"source_predicate": "UMLS:has_parent",
"subject": "UMLS:C2267095",
"update_date": "2023"
}
It looks like this bugfix works. But will not close until we can rerun the cypher in the next build.
match (n {id: 'UMLS:C2267095'})-[r]-(m {id: 'UMLS:C0025598'}) return n,m
should show two subclass_of edges, both pointing from metformin to the drug class M [preparations]
I believe that this issue only affects MED-RT UMLS edges of relation label has_parent. Thus, it does not affect all UMLS edges or even all MED-RT UMLS edges. Thus this issue does not justify terminating and restarting the in-progress KG2.10.2pre build.
The number of edges affected is 20,171:
grep 'MED-RT' /home/ubuntu/kg2-build/kg2-umls-edges.jsonl | grep has_parent | wc -l
20171
Confirmed this bug is still present in KG2.10.2pre.
Trying to reconstruct the chronology from the above messages, it looks like this patch was too late for the KG2.10.2pre build. Thus, I think we will need to test it in the KG2.10.3pre build.
This seems to be solved in KG2.10.3:
Closing.