KG2 suddenly non-responsive to `treats` predicate
Note in this result: https://arax.ci.transltr.io/?r=252131, RTX-KG2C says MetaKG indicates this qedge is unsupported. The query asks for the treats predicate. It appears that http://kg2canonicalized2.rtx.ai:7474/browser/ says there are no treats predicates in KG2C but rather only biolink:treats_or_applied_or_studied_to_treat which is a superclass of treats, so it seems to be a subclass reasoning issue.
High priority since we really only have until Friday to fix this.
That was KG2.9.2 version, btw
so I think our predicate reasoning is working appropriately - treats is a descendant of treats_or_applied_or_studied_to_treat, which means that when someone asks for a treats edge, we are only allowed to respond with edges that have the predicate treats or a descendant of treats (not ancestor; if ancestors were included then that would mean we could respond with related_to edges as well! which makes no sense).
so if someone asks for treats_or_applied_or_studied_to_treat in their query graph, then we can answer with treats edges or with treats_or_applied_or_studied_to_treat edges. but not the other way around.
does that make sense? in my understanding nothing is wrong with our predicate reasoning here..
okay, thanks @amykglen that all seems reasonable to me. Yet, we still need a solution to this problem.
I think we somehow need to be able to to walk up to ancestors somewhat judiciously when the query is "treats + knowledge_type=inferred" (which is what the UI does). When someone asks "which chemicals may treat disease X"? and knowledge_type=inferred, then we need to somehow be able to return these mixin edges.
Said another way, it seems to me that before the treats refactor, we were able to return information that drug X treats disease Y because DrugBank says so. But now that's all gone, and now everything in KG2 is treats_or_applied_or_studied_to_treat? and we're not sure about any treatments anymore? This seems like a big loss. I suppose if we get treats with knowledge_type=lookup, then we should only go with what we're sure of. but if we're not sure of anything, then that's a big loss.
I don't know what the solution here is exactly, but I think one is urgently needed. If I understand correctly, essentially KG2 is completely out of the game for MVP1 at this point.
ah, I see. I'm not familiar with the details of the treats refactor in KG2pre, but I would assume we're now required to use treats_or_applied_or_studied_to_treat for all former treats edges, since that's the transformation that happened (maybe @ecwood can confirm that we're not allowed to use the treats predicate for those edges anymore). I guess I was assuming that with the whole treats refactor the MVP queries would be updated to use treats_or_applied_or_studied_to_treat, but I guess that's not the case!
it should be quite easy technically to add an exception to our predicate reasoning like you mentioned, @edeutsch, where we could also look for treats_or_applied_or_studied_to_treat edges when a "treats + knowledge_type=inferred" query comes in. does that seem like a good (at least) temporary solution? I can do this today and we can roll it out tomorrow along with KG2.10.0c. that would buy us a little time to discuss with the team whether it's an option to use the treats predicate in KG2pre and so on.
though I suppose people might not like seeing treats_or_applied_or_studied_to_treat edges in their answer when they asked for treats... not sure if that's an issue. I suppose we could also have a patch to transform the output edges to treats, but then that's kind of undoing the whole treats refactor, ha. I agree the solution is unclear here - it might help to have some insight from the KG2pre team
Yeah, this refactor seems a bit of a mess. From Chris B: it appears that treats will still be the predicate for MVP1, and treats needs to be the predicate in the top level result response. treats_or_applied_to_treat... edges can then be on support graphs or the like.
And at least in KG2.9.2, it appears there are no treats edges, just the mixin
hmm, very odd. I suppose we could argue that it's ok for ARAX to, in the case of an inferred query, consider KG2's treats_or_applied_or_studied_to_treat edges to be treats edges? that is, ask KG2 for treats_or_applied_or_studied_to_treat edges and then convert their predicates to treats in the answer KG? I suppose it is 'creative' mode after all, ha...
The "treats" refactor was done before I arrived for the summer, so I am unfamiliar with the details of it. @saramsey likely has more information regarding its execution in KG2pre
I am also a bit unfamiliar with the details of the treats refactor, but I was under the impression that "treats" was split into multiple predicates depending on what was really known. When it is well established that metformin treats type 2 diabetes, I'm under the impression that that should remain a "treats" edge in KG2. it is just when it is unclear whether the source is telling us that this is well-established treatment vs. currently undergoing clinical trials vs. being used by some as an off-label way to treat, then we should start using other predicates.
So I am inferring that if KG2 no longer has any "treats" edges, that means that we no longer have well-established treatments in KG2? All we have is a large pile of relationships that might be established, or they might just be undergoing trials, or they might be sometimes applied in a off-label way, but we really don't know. Is that where we are?
I suppose it would be useful to understand what happened to all the "treats" edges in KG2? Were 100% transformed to treats_or_applied_or_studied_to_treat? Or were they split into multiple predicates based on what we really know? I'm thinking that having those stats in hand would be useful.
so at the KG2.10.0c rollout party just now, @edeutsch, @sundareswarpullela, and I agreed that it's probably worth rolling out the proposed patch today (even though it's not an ideal solution) so that KG2 is still contributing to MVP1 for the Fugu release. then next week we can have a discussion about the treats predicate in KG2pre after @saramsey is back, and hopefully come up with a better long-term solution.
the patch will basically be: when a "treats + knowledge_type=inferred" query comes in, ARAX will query KG2 using the higher level treats_or_applied_or_studied_to_treat predicate, and then alter the treats_or_applied_or_studied_to_treat edges it returns so that they use the treats predicate. not great but better than KG2 not returning anything..
and we'll implement this patch such that it can be turned off with a one-line change, so if we decide we don't like this solution, we can easily go back to the way things currently are.
alright, I implemented the patch, tested it, and just merged it to master. ready for rollout when you have a chance @edeutsch!
Thanks @amykglen ! I deployed it to devED but am having some problems. New code deployed:
ISSUE_TEMPLATES/kg2rollout.md | 8 +++++---
code/ARAX/ARAXQuery/ARAX_expander.py | 29 ++++++++++++++++++++++++-----
code/ARAX/ARAXQuery/Expand/trapi_querier.py | 12 +++++++++++-
code/ARAX/test/test_ARAX_expand.py | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
When I run Example 2 JSON, the process crashes with:
2024-07-26T02:11:23.927411 DEBUG: (603) [] infores:service-provider-trapi: Qnodes with an implied parent query ID are: {'on': 'MONDO:0015564'}
2024-07-26T02:11:23.927507 DEBUG: (603) [] infores:service-provider-trapi: All on curies use prefix(es) infores:service-provider-trapi supports; no conversion necessary
2024-07-26T02:11:23.927671 DEBUG: (603) [] infores:service-provider-trapi: Sending query to infores:service-provider-trapi API (https://api.bte.ncats.io/v1/team/Service%20Provider)
2024-07-26T02:11:23.928353 INFO: (603) [] Expanding qedge t_edge using infores:molepro
2024-07-26T02:11:23.928834 DEBUG: (603) [] infores:molepro: Qnodes with an implied parent query ID are: {'on': 'MONDO:0015564'}
2024-07-26T02:11:23.928935 DEBUG: (603) [] infores:molepro: All on curies use prefix(es) infores:molepro supports; no conversion necessary
2024-07-26T02:11:23.929088 DEBUG: (603) [] infores:molepro: Sending query to infores:molepro API (https://translator.broadinstitute.org/molepro/trapi/v1.5)
Traceback (most recent call last):
File "/mnt/data/orangeboard/devED/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/query_controller.py", line 57, in run_query_dict_in_child_process
for json_string in json_string_generator:
File "/mnt/data/orangeboard/devED/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/../../../../ARAX/ARAXQuery/ARAX_query.py", line 134, in query_return_stream
yield(json.dumps(i_message_obj, allow_nan=False) + "\n")
File "/mnt/data/python/Python-3.9.16/lib/python3.9/json/__init__.py", line 234, in dumps
return cls(
File "/mnt/data/python/Python-3.9.16/lib/python3.9/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/mnt/data/python/Python-3.9.16/lib/python3.9/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/mnt/data/python/Python-3.9.16/lib/python3.9/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type QEdge is not JSON serializable
Exception in query_controller.run_query_dict_in_child_process: <class 'TypeError'>
None
INFO: Deleted ARAXOngoingQuery.query_id=7711029
I am uncertain of what to make of this. Ideas?
maybe a logging message includes a QEdge object instead of a string?
ah yes, here, I think: https://github.com/RTXteam/RTX/blob/15fabe9fe40c5e4dd911515fc398831bc3f26653/code/ARAX/ARAXQuery/Expand/trapi_querier.py#L93
I will try fixing that unless you wave me off..
yes! I commented out that line and that fixed it.
Overall I think this patch is a good bandaid. Before: https://arax.ncats.io/test/?r=252443 (2 results)
After: https://arax.ncats.io/devED/?r=252445 (6 results)
I think this is good. I propose we keep and deploy this, But a much deeper conversation about a formally proper fix is needed for Sprint 5.
thanks for fixing that logging issue - weird that didn't produce any issues with the pytest suite.. I just went in and deleted that line as it really isn't needed anyway
and great! yes, hopefully we can discuss a longer-term fix to the treats issue at next week's AHM
it probably didn't cause an error during pytest because the logging data isn't serialized to JSON there. It was during the JSON serialization that things went bad. perhaps a more enduring fix is for the logger to stringify all of its inputs always first.
Heads up, I added some information about the treats refactor that speaks to some of the issues raised in this ticket, in this Translator Feedback Repo comment.
And at least in KG2.9.2, it appears there are no treats edges, just the mixin
Confirmed. At present, there are no places in RTX-KG2 where we are using biolink:treats.
I do note that there used to be DRUGBANK:treats, but it is commented out in predicate-remap.yaml. @ecwood do you know if that is meaning that we are losing out on DRUGBANK:treats edges?
Thank you @edeutsch and @amykglen and @dkoslicki
I do note that there used to be DRUGBANK:treats, but it is commented out in predicate-remap.yaml. @ecwood do you know if that is meaning that we are losing out on DRUGBANK:treats edges?
@saramsey I apologize for the delay on determining this, this took me down a long rabbit hole to determine where those edges went. As far as I can tell, DRUGBANK:treats edges haven't been in KG2 for 2 years (since https://github.com/RTXteam/RTX-KG2/issues/242). At first, I thought this was an issue with the current version of DrugBank. However, I tried running our current script on an older version of DrugBank still in the S3 bucket and those edges were still not generated. I am not sure when that predicate mapping added to KG2 because it does not appear in my initial emails about DrugBank predicate mapping to you from June 2020. It is possible that those edges never existed.
Also, it looks like the only edge type mapped to biolink:treats post refactor is:
MONDO:disease_responds_to:
operation: invert
core_predicate: biolink:treats
This source relation does not appear to be in KG2.9.2. However, it does seem to be in KG2.10.0 since there are 12 biolink:treats edges in it. (Counts obtained by running MATCH p=()-[r:biolink:treats]->() RETURN count(p) on kg2endpoint-kg2-9-2.rtx.ai and kg2endpoint-kg2-10-0.rtx.ai). By running MATCH ()-[r:biolink:treats]->() RETURN count(r), r.primary_knowledge_source, I determined that all of these are from MONDO. I am unclear whether these edges were missed in the treats refactor or whether this was intended.
@chunyuma Just a head's up when retraining xDTD, these changes will impact your model if/when you filter out treats edges from KG2. Probably best to consider treats_or_studied_to_treat... mixin and all its descendants in Biolink as well.
ok, the patch to the patch discussed at today's AHM is in branch 2328patch! it's tested and passing all pytests.
working on this, I noticed a bug with the prior patch (due to Biolink predicate tree weirdness) where Drug Central applied_to_treat edges weren't making it into answers for inferred treats queries. so that's been fixed as well... (in the 2328patch branch)
I also added an attribute to edges that ARAX alters that records KG2's original predicate for that edge. I'm kind of wondering if this is enough vs. adding the entire original edge in a support graph, since the only thing ARAX alter's about the edge is the predicate... it looks like this:
{'attribute_source': 'infores:arax',
'attribute_type_id': 'biolink:original_predicate',
'description': 'Predicate as it appears in RTX-KG2, prior to '
'alteration by ARAX.',
'value': 'biolink:applied_to_treat',
'value_type_id': 'biolink:predicate'}
this is great, thanks!
I think think your proposed annotation is good and fine. others can take issue with it if they notice it and bring it up.
I think it is temporary anyway until the Drug Approvals KP is in prime time, at which point I think it would be appropriate to back all this out?
great - I just merged it into master (since we've already completed the master --> itrb-test merge for Guppy), so we should get weekend testing results with the patch-to-the-patch in place.
just rolled out to arax2.ncats.io
In KG2.10 there are no more treats only _applied_to_treat.... The "patch" maps these predicates to treat for non-Semmed edges Question is: With Drug Approvals KP now avail, do we back out that patch @amykglen will try to answer that If good, then as soon as Drug Approvals in CI, then back out the patch and recheck.