RTX icon indicating copy to clipboard operation
RTX copied to clipboard

KG2 suddenly non-responsive to `treats` predicate

Open dkoslicki opened this issue 1 year ago • 24 comments

Note in this result: https://arax.ci.transltr.io/?r=252131, RTX-KG2C says MetaKG indicates this qedge is unsupported. The query asks for the treats predicate. It appears that http://kg2canonicalized2.rtx.ai:7474/browser/ says there are no treats predicates in KG2C but rather only biolink:treats_or_applied_or_studied_to_treat which is a superclass of treats, so it seems to be a subclass reasoning issue.

High priority since we really only have until Friday to fix this.

dkoslicki avatar Jul 24 '24 17:07 dkoslicki

That was KG2.9.2 version, btw

dkoslicki avatar Jul 24 '24 18:07 dkoslicki

so I think our predicate reasoning is working appropriately - treats is a descendant of treats_or_applied_or_studied_to_treat, which means that when someone asks for a treats edge, we are only allowed to respond with edges that have the predicate treats or a descendant of treats (not ancestor; if ancestors were included then that would mean we could respond with related_to edges as well! which makes no sense).

so if someone asks for treats_or_applied_or_studied_to_treat in their query graph, then we can answer with treats edges or with treats_or_applied_or_studied_to_treat edges. but not the other way around.

does that make sense? in my understanding nothing is wrong with our predicate reasoning here..

amykglen avatar Jul 24 '24 21:07 amykglen

okay, thanks @amykglen that all seems reasonable to me. Yet, we still need a solution to this problem.

I think we somehow need to be able to to walk up to ancestors somewhat judiciously when the query is "treats + knowledge_type=inferred" (which is what the UI does). When someone asks "which chemicals may treat disease X"? and knowledge_type=inferred, then we need to somehow be able to return these mixin edges.

Said another way, it seems to me that before the treats refactor, we were able to return information that drug X treats disease Y because DrugBank says so. But now that's all gone, and now everything in KG2 is treats_or_applied_or_studied_to_treat? and we're not sure about any treatments anymore? This seems like a big loss. I suppose if we get treats with knowledge_type=lookup, then we should only go with what we're sure of. but if we're not sure of anything, then that's a big loss.

I don't know what the solution here is exactly, but I think one is urgently needed. If I understand correctly, essentially KG2 is completely out of the game for MVP1 at this point.

edeutsch avatar Jul 24 '24 23:07 edeutsch

ah, I see. I'm not familiar with the details of the treats refactor in KG2pre, but I would assume we're now required to use treats_or_applied_or_studied_to_treat for all former treats edges, since that's the transformation that happened (maybe @ecwood can confirm that we're not allowed to use the treats predicate for those edges anymore). I guess I was assuming that with the whole treats refactor the MVP queries would be updated to use treats_or_applied_or_studied_to_treat, but I guess that's not the case!

it should be quite easy technically to add an exception to our predicate reasoning like you mentioned, @edeutsch, where we could also look for treats_or_applied_or_studied_to_treat edges when a "treats + knowledge_type=inferred" query comes in. does that seem like a good (at least) temporary solution? I can do this today and we can roll it out tomorrow along with KG2.10.0c. that would buy us a little time to discuss with the team whether it's an option to use the treats predicate in KG2pre and so on.

amykglen avatar Jul 25 '24 00:07 amykglen

though I suppose people might not like seeing treats_or_applied_or_studied_to_treat edges in their answer when they asked for treats... not sure if that's an issue. I suppose we could also have a patch to transform the output edges to treats, but then that's kind of undoing the whole treats refactor, ha. I agree the solution is unclear here - it might help to have some insight from the KG2pre team

amykglen avatar Jul 25 '24 00:07 amykglen

Yeah, this refactor seems a bit of a mess. From Chris B: it appears that treats will still be the predicate for MVP1, and treats needs to be the predicate in the top level result response. treats_or_applied_to_treat... edges can then be on support graphs or the like.

dkoslicki avatar Jul 25 '24 00:07 dkoslicki

And at least in KG2.9.2, it appears there are no treats edges, just the mixin

dkoslicki avatar Jul 25 '24 00:07 dkoslicki

hmm, very odd. I suppose we could argue that it's ok for ARAX to, in the case of an inferred query, consider KG2's treats_or_applied_or_studied_to_treat edges to be treats edges? that is, ask KG2 for treats_or_applied_or_studied_to_treat edges and then convert their predicates to treats in the answer KG? I suppose it is 'creative' mode after all, ha...

amykglen avatar Jul 25 '24 00:07 amykglen

The "treats" refactor was done before I arrived for the summer, so I am unfamiliar with the details of it. @saramsey likely has more information regarding its execution in KG2pre

ecwood avatar Jul 25 '24 01:07 ecwood

I am also a bit unfamiliar with the details of the treats refactor, but I was under the impression that "treats" was split into multiple predicates depending on what was really known. When it is well established that metformin treats type 2 diabetes, I'm under the impression that that should remain a "treats" edge in KG2. it is just when it is unclear whether the source is telling us that this is well-established treatment vs. currently undergoing clinical trials vs. being used by some as an off-label way to treat, then we should start using other predicates.

So I am inferring that if KG2 no longer has any "treats" edges, that means that we no longer have well-established treatments in KG2? All we have is a large pile of relationships that might be established, or they might just be undergoing trials, or they might be sometimes applied in a off-label way, but we really don't know. Is that where we are?

I suppose it would be useful to understand what happened to all the "treats" edges in KG2? Were 100% transformed to treats_or_applied_or_studied_to_treat? Or were they split into multiple predicates based on what we really know? I'm thinking that having those stats in hand would be useful.

edeutsch avatar Jul 25 '24 01:07 edeutsch

so at the KG2.10.0c rollout party just now, @edeutsch, @sundareswarpullela, and I agreed that it's probably worth rolling out the proposed patch today (even though it's not an ideal solution) so that KG2 is still contributing to MVP1 for the Fugu release. then next week we can have a discussion about the treats predicate in KG2pre after @saramsey is back, and hopefully come up with a better long-term solution.

the patch will basically be: when a "treats + knowledge_type=inferred" query comes in, ARAX will query KG2 using the higher level treats_or_applied_or_studied_to_treat predicate, and then alter the treats_or_applied_or_studied_to_treat edges it returns so that they use the treats predicate. not great but better than KG2 not returning anything..

and we'll implement this patch such that it can be turned off with a one-line change, so if we decide we don't like this solution, we can easily go back to the way things currently are.

amykglen avatar Jul 25 '24 20:07 amykglen

alright, I implemented the patch, tested it, and just merged it to master. ready for rollout when you have a chance @edeutsch!

amykglen avatar Jul 26 '24 00:07 amykglen

Thanks @amykglen ! I deployed it to devED but am having some problems. New code deployed:

 ISSUE_TEMPLATES/kg2rollout.md               |  8 +++++---
 code/ARAX/ARAXQuery/ARAX_expander.py        | 29 ++++++++++++++++++++++++-----
 code/ARAX/ARAXQuery/Expand/trapi_querier.py | 12 +++++++++++-
 code/ARAX/test/test_ARAX_expand.py          | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

When I run Example 2 JSON, the process crashes with:

2024-07-26T02:11:23.927411 DEBUG: (603) [] infores:service-provider-trapi: Qnodes with an implied parent query ID are: {'on': 'MONDO:0015564'}
2024-07-26T02:11:23.927507 DEBUG: (603) [] infores:service-provider-trapi: All on curies use prefix(es) infores:service-provider-trapi supports; no conversion necessary
2024-07-26T02:11:23.927671 DEBUG: (603) [] infores:service-provider-trapi: Sending query to infores:service-provider-trapi API (https://api.bte.ncats.io/v1/team/Service%20Provider)
2024-07-26T02:11:23.928353 INFO: (603) [] Expanding qedge t_edge using infores:molepro
2024-07-26T02:11:23.928834 DEBUG: (603) [] infores:molepro: Qnodes with an implied parent query ID are: {'on': 'MONDO:0015564'}
2024-07-26T02:11:23.928935 DEBUG: (603) [] infores:molepro: All on curies use prefix(es) infores:molepro supports; no conversion necessary
2024-07-26T02:11:23.929088 DEBUG: (603) [] infores:molepro: Sending query to infores:molepro API (https://translator.broadinstitute.org/molepro/trapi/v1.5)
Traceback (most recent call last):
  File "/mnt/data/orangeboard/devED/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/query_controller.py", line 57, in run_query_dict_in_child_process
    for json_string in json_string_generator:
  File "/mnt/data/orangeboard/devED/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/../../../../ARAX/ARAXQuery/ARAX_query.py", line 134, in query_return_stream
    yield(json.dumps(i_message_obj, allow_nan=False) + "\n")
  File "/mnt/data/python/Python-3.9.16/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/mnt/data/python/Python-3.9.16/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/mnt/data/python/Python-3.9.16/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/mnt/data/python/Python-3.9.16/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type QEdge is not JSON serializable
Exception in query_controller.run_query_dict_in_child_process: <class 'TypeError'>
None
INFO: Deleted ARAXOngoingQuery.query_id=7711029

I am uncertain of what to make of this. Ideas?

edeutsch avatar Jul 26 '24 02:07 edeutsch

maybe a logging message includes a QEdge object instead of a string?

edeutsch avatar Jul 26 '24 02:07 edeutsch

ah yes, here, I think: https://github.com/RTXteam/RTX/blob/15fabe9fe40c5e4dd911515fc398831bc3f26653/code/ARAX/ARAXQuery/Expand/trapi_querier.py#L93

I will try fixing that unless you wave me off..

edeutsch avatar Jul 26 '24 02:07 edeutsch

yes! I commented out that line and that fixed it.

Overall I think this patch is a good bandaid. Before: https://arax.ncats.io/test/?r=252443 (2 results)

After: https://arax.ncats.io/devED/?r=252445 (6 results)

I think this is good. I propose we keep and deploy this, But a much deeper conversation about a formally proper fix is needed for Sprint 5.

edeutsch avatar Jul 26 '24 02:07 edeutsch

thanks for fixing that logging issue - weird that didn't produce any issues with the pytest suite.. I just went in and deleted that line as it really isn't needed anyway

and great! yes, hopefully we can discuss a longer-term fix to the treats issue at next week's AHM

amykglen avatar Jul 26 '24 03:07 amykglen

it probably didn't cause an error during pytest because the logging data isn't serialized to JSON there. It was during the JSON serialization that things went bad. perhaps a more enduring fix is for the logger to stringify all of its inputs always first.

edeutsch avatar Jul 26 '24 04:07 edeutsch

Heads up, I added some information about the treats refactor that speaks to some of the issues raised in this ticket, in this Translator Feedback Repo comment.

mbrush avatar Jul 27 '24 00:07 mbrush

And at least in KG2.9.2, it appears there are no treats edges, just the mixin

Confirmed. At present, there are no places in RTX-KG2 where we are using biolink:treats.

I do note that there used to be DRUGBANK:treats, but it is commented out in predicate-remap.yaml. @ecwood do you know if that is meaning that we are losing out on DRUGBANK:treats edges?

saramsey avatar Jul 27 '24 19:07 saramsey

Thank you @edeutsch and @amykglen and @dkoslicki

saramsey avatar Jul 27 '24 19:07 saramsey

I do note that there used to be DRUGBANK:treats, but it is commented out in predicate-remap.yaml. @ecwood do you know if that is meaning that we are losing out on DRUGBANK:treats edges?

@saramsey I apologize for the delay on determining this, this took me down a long rabbit hole to determine where those edges went. As far as I can tell, DRUGBANK:treats edges haven't been in KG2 for 2 years (since https://github.com/RTXteam/RTX-KG2/issues/242). At first, I thought this was an issue with the current version of DrugBank. However, I tried running our current script on an older version of DrugBank still in the S3 bucket and those edges were still not generated. I am not sure when that predicate mapping added to KG2 because it does not appear in my initial emails about DrugBank predicate mapping to you from June 2020. It is possible that those edges never existed.

ecwood avatar Jul 29 '24 09:07 ecwood

Also, it looks like the only edge type mapped to biolink:treats post refactor is:

MONDO:disease_responds_to:
  operation: invert
  core_predicate: biolink:treats

This source relation does not appear to be in KG2.9.2. However, it does seem to be in KG2.10.0 since there are 12 biolink:treats edges in it. (Counts obtained by running MATCH p=()-[r:biolink:treats]->() RETURN count(p) on kg2endpoint-kg2-9-2.rtx.ai and kg2endpoint-kg2-10-0.rtx.ai). By running MATCH ()-[r:biolink:treats]->() RETURN count(r), r.primary_knowledge_source, I determined that all of these are from MONDO. I am unclear whether these edges were missed in the treats refactor or whether this was intended.

ecwood avatar Jul 29 '24 10:07 ecwood

@chunyuma Just a head's up when retraining xDTD, these changes will impact your model if/when you filter out treats edges from KG2. Probably best to consider treats_or_studied_to_treat... mixin and all its descendants in Biolink as well.

dkoslicki avatar Jul 29 '24 16:07 dkoslicki

ok, the patch to the patch discussed at today's AHM is in branch 2328patch! it's tested and passing all pytests.

working on this, I noticed a bug with the prior patch (due to Biolink predicate tree weirdness) where Drug Central applied_to_treat edges weren't making it into answers for inferred treats queries. so that's been fixed as well... (in the 2328patch branch)

I also added an attribute to edges that ARAX alters that records KG2's original predicate for that edge. I'm kind of wondering if this is enough vs. adding the entire original edge in a support graph, since the only thing ARAX alter's about the edge is the predicate... it looks like this:

                {'attribute_source': 'infores:arax',
                 'attribute_type_id': 'biolink:original_predicate',
                 'description': 'Predicate as it appears in RTX-KG2, prior to '
                                'alteration by ARAX.',
                 'value': 'biolink:applied_to_treat',
                 'value_type_id': 'biolink:predicate'}

amykglen avatar Aug 22 '24 01:08 amykglen

this is great, thanks!

I think think your proposed annotation is good and fine. others can take issue with it if they notice it and bring it up.

I think it is temporary anyway until the Drug Approvals KP is in prime time, at which point I think it would be appropriate to back all this out?

edeutsch avatar Aug 23 '24 05:08 edeutsch

great - I just merged it into master (since we've already completed the master --> itrb-test merge for Guppy), so we should get weekend testing results with the patch-to-the-patch in place.

amykglen avatar Aug 23 '24 23:08 amykglen

just rolled out to arax2.ncats.io

edeutsch avatar Aug 24 '24 02:08 edeutsch

In KG2.10 there are no more treats only _applied_to_treat.... The "patch" maps these predicates to treat for non-Semmed edges Question is: With Drug Approvals KP now avail, do we back out that patch @amykglen will try to answer that If good, then as soon as Drug Approvals in CI, then back out the patch and recheck.

edeutsch avatar Aug 28 '24 17:08 edeutsch