RTX icon indicating copy to clipboard operation
RTX copied to clipboard

Allow unbound queries to RTX-KG2?

Open edeutsch opened this issue 3 years ago • 7 comments

On various calls, Sierra has been asking to do unbound queries. the use case is that (as a hypthetical example) in our /meta_knowledge_graph we have biolink:Publication - biolink:treats - biolink:Disease and Sierra wants to know which edges do we have for this relationship? (I know of several publications that could be used to treat insomnia, but I don't know if we have those triples). Sierra wants examples. But our current system rejects queries with no CURIE as would be required for her to explore this.

Should we consider allowing queries to RTX-KG2 (not ARAX?) without CURIEs? Do we need other safeguards to prevent explosions?

edeutsch avatar Jul 21 '22 18:07 edeutsch

perhaps in concert with a limiting mechanism: https://github.com/NCATSTranslator/OperationsAndWorkflows/issues/73

edeutsch avatar Jul 21 '22 19:07 edeutsch

I think it wouldn't be too big of a change for Expand to allow such queries if it just limited how many edges could be returned. we already have a hard-stop of 1,000,000 edges when querying Plover - we could reduce that to something more reasonable for unpinned queries... or plug in a user-specified threshold if a workflow operation is eventually added for that. (not sure of the best way to communicate that number to Expand()... another parameter?)

isn't there already a max_results slot for TRAPI queries? or is that only an ARAX thing and not official TRAPI... if there is, Expand should be able to inspect that value?

@rcpeene would likely need to adjust his new query graph splitting code, since that assumes at least one qnode is pinned - I'm thinking maybe to start, the graph splitter should just ignore query graphs where no qnodes are pinned (meaning, just return the input query graph without splitting it at all)...

amykglen avatar Jul 21 '22 21:07 amykglen

The max_results thing is only our own ARAX/RTX-KG2 thing, and I don't think we ever properly implemented it.

I vote we just let CURIE-less query_graphs proceed if RTX-KG2, and you just update Expand to allow these with some reasonable cutoff (10,000?) and see if Sierra likes it.

edeutsch avatar Jul 22 '22 02:07 edeutsch

@amykglen's suggestion would be very easy to implement in the new graph splitter module; If a query graph has no pinned nodes it would be no problem to just return the whole graph, unsplit. It's worth noting, then, that the original purpose of graph_splitter.py would be of no use for these type of queries. In other words, these type of queries may suffer some answer quality loss.

rcpeene avatar Jul 25 '22 22:07 rcpeene

from 8/3 discussion: this is high-ish priority (Sierra would like to have this)

amykglen avatar Aug 03 '22 21:08 amykglen

some notes for @rcpeene:

  • I think basically we'll want to change the main Expand code (in ARAX_expander.py) so it doesn't require query graphs to have >=1 pinned nodes (not sure how extensive changes related to that will be)
  • and then, for queries that are lacking a pinned node, change this max_allowed_edges threshold to something much smaller
  • there may also be some downstream tweaks needed in Resultify, but I can handle those (I think those shouldn't be too involved)
  • note that to test these changes you'll need to follow these steps (since you'll be editing KG2 API-specific code)

amykglen avatar Aug 31 '22 17:08 amykglen

I made those changes listed by @amykglen above, and pushed them to branch issue1874. I will enumerate them more specifically here.

  • I removed a check which would prevent running an unbound query in query_graph_info.py
  • I modified checks in four locations in ARAX_Expander.py which would prevent running an unbound query so that they only do so if Expand isn't run in RTXKG2 mode.
  • The first check, which is in the apply method, also sets a value which causes KG2Querier to be instantiated with a much lower max_allowed_edges. I set this to 20, but it could be easily changed.
  • I modified the constructor of KG2Querier in kg2_querier.py to accept an option max_allowed_edges parameter.

It is difficult to test these changes for two reasons. Firstly, for tests to successfully return results, I believe @amykglen will need to make and push her changes to resultify to the branch. Secondly, the changes only have a positive effect when Expand is run in RTXKG2 mode. When run in ARAX mode, the desired behavior is to cause an error if issued an unbound query. Since the tests all run Expand in ARAX mode first, some temporary hardcoding has to be done to prevent it from throwing an error.

Currently awaiting feedback from Amy and we can proceed with ensuring it's working smoothly.

rcpeene avatar Sep 08 '22 02:09 rcpeene