ARAX returning results with predicates that are not supported by Biolink Model 3.1.2
I looked into this very briefly so I thought I'd start an issue locally.
Related to this: https://github.com/NCATSTranslator/Feedback/issues/118
It seems like there are two possibilities:
- KG2.8.0 contains "gene_associated_with_condition", which apparently it should not
- They are querying production KG2.7.6 instead of CI KG2.8.0
The query in question appears to be:
| 827077 | 2023-02-15 19:42:47 | 40 | ars.ci.transltr.io | 52.4.10.150, 10.11.0.190 | arax.ci.transltr.io | arax-0 | ARAX | 5094 | 129227 | ✓ Completed | OK | Normal completion with 24 results. |
|---|
Are the result is https://arax.ncats.io/?r=129227 https://arax.ncats.io/api/arax/v1.3/response/129227
As far as I can tell they are querying CI with KG2.8.0, so I think that rules out option 2. Although I am not really certain.
There is also potentially option 3:
3) As far as we were aware, the ask was for gene-chemical associations:
not for gene-disease associations. So this is not yet done.
That's about all I know. Tossing this out for others more knowledgeable..
Hi @edeutsch thank you for reporting this issue in the RTX project area so we can track it locally. I guess my first order of business is to try and ascertain, when a query is posted on ui.ci.transltr.io, which ARAX endpoint is queried? How would we know?
I'm uncertain that I understand the question correctly, but the endpoint is likely to be /async_query. But could be /query. I think an easy way to know is to look at the second log entry, and if it is "asynchronous Query launching on incoming Query", then you know that the endpoint was /async_query
Thanks @edeutsch. I didn't phrase my question well. I meant, what ARAX API base URL is being hit? How do we know that it is an ARAX installation that configured to query RTX-KG2.8.0c?
Well, you can see in the OP that arax.ci.transltr.io is being hit. And arax.ci.transltr.io should be hitting KG2.8.0c. But it is not trivial to be sure. I sifted through the logs to find a tell and did not come up with one. I thought it listed in the logs exactly which KP endpoint URLs are being bit, but I didn't see it there. Maybe I just missed it.
I just went to arax.ci.transltr.io and issued our example query and only 17 results came back, which I think is the surest sign that arax.ci.transltr.io is using KG2.8.0c right now (KG2.7.6c returns >100). But was it 2.8.0c when the initial query was sent? I'm not 100% certain. I think so, but there is room for uncertainty here. Maybe we need to consider augmenting the log messages to make it more clear.
Well, you can see in the OP that arax.ci.transltr.io is being hit. And arax.ci.transltr.io should be hitting KG2.8.0c. But it is not trivial to be sure. I sifted through the logs to find a tell and did not come up with one. I thought it listed in the logs exactly which KP endpoint URLs are being bit, but I didn't see it there. Maybe I just missed it.
I just went to arax.ci.transltr.io and issued our example query and only 17 results came back, which I think is the surest sign that arax.ci.transltr.io is using KG2.8.0c right now (KG2.7.6c returns >100). But was it 2.8.0c when the initial query was sent? I'm not 100% certain. I think so, but there is room for uncertainty here. Maybe we need to consider augmenting the log messages to make it more clear.
Thanks for your reply. Because I cannot remember any of these details, I took a look at this Google Sheet which seems to aim to disambiguate the various ARAX instances:
https://docs.google.com/spreadsheets/d/1eC3GrRW6gw5zn7XKjvaHD9KCulO-GqEaTJ3mG6PgOOY/edit#gid=0
Isn't KG2.8.0c only served up in development maturity level ARAX installations?
I would imagine that arax.ci.transltr.io would be querying the RTX-KG2 service on kg2.ci.transltr.io. As far as I know (but maybe I am out-of-date), that instance is running code from the master branch of the RTX project area, whereas I thought that, in the RTX project area, all the KG2.8.0c stuff was in the kg2-integration branch. Paging @amykglen and @acevedol for a sanity check on what I just wrote.
Ah, wait, I see that the master branch of RTX has file code/config_dbs.json which is clearly referencing KG2.8.0c,

It is my understanding:
I would imagine that arax.ci.transltr.io would be querying the RTX-KG2 service on kg2.ci.transltr.io.
correct.
As far as I know (but maybe I am out-of-date), that instance is running code from the master branch of the RTX project area,
correct.
whereas I thought that, in the RTX project area, all the KG2.8.0c stuff was in the kg2-integration branch.
This is true, but it is also now in master. So anything running master should have KG2.8.0.
Paging @amykglen and @acevedol for a sanity check on what I just wrote.
Thanks @edeutsch. So, the view from the synonymizer supports your expectation that arax.ci.transltr.io is indeed backed by the RTX-KG2.8.0c KP:
- KG2.8.0 contains "gene_associated_with_condition", which apparently it should not
Well, gene associated with condition is in the Biolink 3.0 spec (the version of Biolink against which RTX-KG2.8.0pre was built). Here is the permalink:
https://github.com/biolink/biolink-model/blob/1efe2ed5a738f9cb4c32566f8cb7e713f62fa1ab/biolink-model.yaml#L4420
That predicate is also not deprecated:
Note, the predicate gene associated with condition is also in Biolink 3.1.2, and also (in that release) not deprecated:

Also, I note that in NCATSTranslator/Feedback issue 118, they didn't boldface gene_associated_with_condition, so I am not sure that is the predicate they were raising an issue about?
Now, as to KCNH3 [increases_activity_of] Gentamycin, that's another story. That edge does indeed seem to be coming from "RTX-KG2",
ah, sorry, my error, I mis-inferred which predicate they were unhappy about
But RTX-KG2.8.0pre doesn't have that predicate!
And yes, in the above query, I am talking to RTX-KG2.8.0pre:
So, my friends, we have a bit of a mystery on our hands. We have an ARAX result-set that supposedly comes from a query being executed via ui.ci.transltr.io,
https://arax.ncats.io/?r=cccb6699-29f4-492b-9733-ab77bd1a8261
that pulls up a collection of edges that includes a Biolink 2.X.X-era predicate (i.e., biolink:increases_activity_of). Which seems to imply that somewhere under the hood, that query is being serviced by an RTX-KG2 KP that is backed by KG2.7.6c and not KG2.8.0c. But how?
Here is the JSON query:
{
"edges": {
"N1": {
"attribute_constraints": [],
"object": "sn",
"predicates": [
"biolink:has_normalized_google_distance_with"
],
"qualifier_constraints": [],
"subject": "on"
},
"creative_DTD_qedge_0": {
"attribute_constraints": [],
"exclude": false,
"object": "creative_DTD_qnode_0",
"option_group_id": "creative_DTD_option_group_0",
"qualifier_constraints": [],
"subject": "sn"
},
"creative_DTD_qedge_1": {
"attribute_constraints": [],
"exclude": false,
"object": "creative_DTD_qnode_1",
"option_group_id": "creative_DTD_option_group_0",
"qualifier_constraints": [],
"subject": "creative_DTD_qnode_0"
},
"creative_DTD_qedge_2": {
"attribute_constraints": [],
"exclude": false,
"object": "on",
"option_group_id": "creative_DTD_option_group_0",
"qualifier_constraints": [],
"subject": "creative_DTD_qnode_1"
},
"t_edge": {
"attribute_constraints": [],
"knowledge_type": "inferred",
"object": "on",
"predicates": [
"biolink:treats"
],
"qualifier_constraints": [],
"subject": "sn"
}
},
"nodes": {
"creative_DTD_qnode_0": {
"constraints": [],
"is_set": true,
"option_group_id": "creative_DTD_option_group_0"
},
"creative_DTD_qnode_1": {
"constraints": [],
"is_set": true,
"option_group_id": "creative_DTD_option_group_0"
},
"on": {
"categories": [
"biolink:Disease"
],
"constraints": [],
"ids": [
"MONDO:0007972"
],
"is_set": false
},
"sn": {
"categories": [
"biolink:NamedThing"
],
"constraints": [],
"is_set": false
}
}
}
I am posting this DSL query to arax.ci.transltr.io right now, to see what we get:
add_qnode(ids=CHEMBL.COMPOUND:CHEMBL643, key=n0)
add_qnode(categories=biolink:Protein, key=n1)
add_qedge(subject=n0, object=n1, key=e0)
expand(kp=infores:rtx-kg2)
resultify()
filter_results(action=limit_number_of_results, max_results=100)
I am hoping to pull up the CHEMBL.COMPOUND:CHEMBL643--UniProtKB:Q9ULD8 edge so I can examine the predicate. Fingers crossed....
So, here is the edge in question. Note the Biolink-3.0-compatible predicate:
And if we click on the edge, we see:
This latest evidence motivates me to ask, what exactly tells us that ui.ci.transltr.io is querying arax.ci.transltr.io specifically (and not some other ARAX instance) via the ARS?
aha, wait a sec. The query_graph you show above has lots of "creative_DTD" stuff in it (that I don't fully understand).. what if this edge is coming out of DTD results? and not from KG2.8.0c itself?
Would DTD be giving an edge with 'biolink:increases_activity_of` as the predicate?
Wouldn't DTD leave some trace in the edge attributes (like EPC type stuff?), that it was a predicted edge?
I don't know. Perhaps my idea is preposterous. But sometimes that's all I've got. Quite often, actually.. I think we may need @amykglen and @chunyuma to come to the rescue..
So, I have confirmed that in KG2.7.6c, the old (Biolink-2.X.X-era) predicate is there:
At this point, I am fairly confident that, somehow, the cached result posted in NCATSTranslator/Feedback issue 118 is showing an edge from RTX-KG2.7.6c.
so I think because the original query was a knowledge_type: inferred query, I don't think RTX-KG2 is actually being queried directly as a KP (at least in the usual sense).
I think it's XDTD that added all those KG2 edges, as @edeutsch suggested - and I don't know the details of how XDTD does that. not sure if it queries KG2 at runtime? but certainly those edges are from an older KG2 version. @chunyuma or @dkoslicki would know the details about where those edges are coming from.
of note, the edges added by XDTD seem to be lacking attributes (except for one attribute each, indicating they came from KG2); that probably isn't ideal. at the very least we should be 'tagging' those edges to make it clear they were added by XDTD, I'd think?
so I think because the original query was a
knowledge_type: inferredquery, I don't think RTX-KG2 is actually being queried directly as a KP (at least in the usual sense).I think it's XDTD that added all those KG2 edges, as @edeutsch suggested - and I don't know the details of how XDTD does that. not sure if it queries KG2 at runtime? but certainly those edges are from an older KG2 version. @chunyuma or @dkoslicki would know the details about where those edges are coming from.
of note, the edges added by XDTD seem to be lacking attributes (except for one attribute each, indicating they came from KG2); that probably isn't ideal. at the very least we should be 'tagging' those edges to make it clear they were added by XDTD, I'd think?
Thank you, @amykglen. Agreed on all points.
Hi @chunyuma when you have a moment, could you please weigh in on this? We want to know if the DTD module could be adding edges to the query-specific KG, with predicates like biolink:increases_activity_of.
Sorry for the delay @saramsey and @amykglen. The xDTD model does not query KG2 at run-time. This is likely a problem due to training the xDTD model on a previous version of KG2 and predicates have since changed. @chunyuma said he can look into how to address that. Any idea what the priority level is for this?