RTX icon indicating copy to clipboard operation
RTX copied to clipboard

Add a cached version of SRI info or find another way of speeding up dtd tests

Open finnagin opened this issue 4 years ago • 17 comments

See here: https://travis-ci.com/github/RTXteam/RTX/builds/231574952

Seem to have a bunch of these before failure:

WARNING: returned with status 404 while retrieving ancestors for biolink:ChemicalSubstance
WARNING: returned with status 404 while retrieving ancestors for biolink:ChemicalSubstance
WARNING: returned with status 404 while retrieving ancestors for biolink:ChemicalSubstance
WARNING: returned with status 404 while retrieving ancestors for biolink:ChemicalSubstance
WARNING: returned with status 404 while retrieving ancestors for biolink:ChemicalSubstance
WARNING: returned with status 404 while retrieving ancestors for biolink:ChemicalSubstance
WARNING: returned with status 404 while retrieving ancestors for biolink:ChemicalSubstance
WARNING: returned with status 404 while retrieving ancestors for biolink:ChemicalSubstance

Then the following traceback:

Response:

  status: ERROR

  n_errors: 1  n_warnings: 0  n_messages: 58

  error_code: UncaughtARAXiError   message: An uncaught error occurred: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')): ['Traceback (most recent call last):\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen\n    chunked=chunked,\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request\n    six.raise_from(e, None)\n', '  File "<string>", line 3, in raise_from\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request\n    httplib_response = conn.getresponse()\n', '  File "/opt/python/3.7.6/lib/python3.7/http/client.py", line 1344, in getresponse\n    response.begin()\n', '  File "/opt/python/3.7.6/lib/python3.7/http/client.py", line 306, in begin\n    version, status, reason = self._read_status()\n', '  File "/opt/python/3.7.6/lib/python3.7/http/client.py", line 275, in _read_status\n    raise RemoteDisconnected("Remote end closed connection without"\n', 'http.client.RemoteDisconnected: Remote end closed connection without response\n', '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/adapters.py", line 449, in send\n    timeout=timeout\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen\n    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/util/retry.py", line 410, in increment\n    raise six.reraise(type(error), error, _stacktrace)\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise\n    raise value.with_traceback(tb)\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen\n    chunked=chunked,\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request\n    six.raise_from(e, None)\n', '  File "<string>", line 3, in raise_from\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request\n    httplib_response = conn.getresponse()\n', '  File "/opt/python/3.7.6/lib/python3.7/http/client.py", line 1344, in getresponse\n    response.begin()\n', '  File "/opt/python/3.7.6/lib/python3.7/http/client.py", line 306, in begin\n    version, status, reason = self._read_status()\n', '  File "/opt/python/3.7.6/lib/python3.7/http/client.py", line 275, in _read_status\n    raise RemoteDisconnected("Remote end closed connection without"\n', "urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))\n", '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/ARAX_query.py", line 533, in execute_processing_plan\n    overlay.apply(response, action[\'parameters\'])\n', '  File "/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/ARAX_overlay.py", line 531, in apply\n    \'action\'])()  # thank you https://stackoverflow.com/questions/11649848/call-methods-by-string\n', '  File "/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/ARAX_overlay.py", line 847, in __predict_drug_treats_disease\n    response = PDTD.predict_drug_treats_disease()\n', '  File "/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/Overlay/predict_drug_treats_disease.py", line 196, in predict_drug_treats_disease\n    converted_target_curie = self.convert_to_trained_curies(target_curie)\n', '  File "/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/Overlay/predict_drug_treats_disease.py", line 139, in convert_to_trained_curies\n    normalizer_result = self.synonymizer.get_canonical_curies(curies=[input_curie], return_all_categories=True)\n', '  File "/home/travis/build/RTXteam/RTX/code/ARAX/ARAXQuery/../NodeSynonymizer/node_synonymizer.py", line 1963, in get_canonical_curies\n    results[entity][\'expanded_categories\'] = category_manager.get_expansive_categories(row[4])\n', '  File "/home/travis/build/RTXteam/RTX/code/ARAX/ARAXQuery/../NodeSynonymizer/category_manager.py", line 133, in get_expansive_categories\n    ancestors = self.get_ancestors(category)\n', '  File "/home/travis/build/RTXteam/RTX/code/ARAX/ARAXQuery/../NodeSynonymizer/category_manager.py", line 86, in get_ancestors\n    response_content = requests.get(url, headers={\'accept\': \'application/json\'})\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/api.py", line 76, in get\n    return request(\'get\', url, params=params, **kwargs)\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/api.py", line 61, in request\n    return session.request(method=method, url=url, **kwargs)\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests_cache/core.py", line 136, in request\n    **kwargs\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/sessions.py", line 530, in request\n    resp = self.send(prep, **send_kwargs)\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests_cache/core.py", line 109, in send\n    return send_request_and_cache_response()\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests_cache/core.py", line 97, in send_request_and_cache_response\n    response = super(CachedSession, self).send(request, **kwargs)\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/sessions.py", line 643, in send\n    r = adapter.send(request, **kwargs)\n', '  File "/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/adapters.py", line 498, in send\n    raise ConnectionError(err, request=request)\n', "requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))\n"]

finnagin avatar Jul 02 '21 04:07 finnagin

The get_ancestors function calls the SRI for this information. Seemed like a good idea. There is some level of caching which should help. But I suppose a new CI instance will not have any cache, so if the service is down, then this fails.

Maybe we need to design the system so that it builds a cache of ancestors and stores it in the "databases" area like for our other 2.6.7 databases. it should be unchanging for a given database, I suppose. It will change when we migrate to bioLink 2.0.

I suppose relying on this SRI service is not a good idea.

edeutsch avatar Jul 02 '21 05:07 edeutsch

There still seems to be an intermittent issue with the SRI being slow causing test_predict_drug_treats_disease_attribute to fail.

Could explicitly skip these for now in Travis CI until this is fixed or we add a cached version of the SRI.

finnagin avatar Jul 12 '21 14:07 finnagin

Should we revisit adding a chached version of the SRI now? Currently we cannot run the DTD tests on Travis because it takes a really long time.

Maybe the solution is instead to speed up the DTD tests some other way since this only seems to be affecting these tests?

finnagin avatar Aug 10 '21 20:08 finnagin

I do think it would be beneficial to have a module that has a cached version of the BioLink model to ancestors and descendants. But, I would be surprised if it were true that SRI is so slow that tests fail. Subsequent calls should be cached, so it's only initial calls that should be a little slower. If SRI is down then this certainly a point of failure, but are you sure that under normal circumstances SRI BioLink lookups are so slow as you make tests fail? Seems unlikely to me.

edeutsch avatar Aug 11 '21 05:08 edeutsch

Hmmm... Will have to look into what is causing the slowdown then. I timed the uncached version of test_predict_drug_treats_disease_attribute and it took 39006.68 seconds (10 hours) to finish.

finnagin avatar Aug 11 '21 10:08 finnagin

Right, Biolink isn't that big. So in principle, that information should be ultrafast to look up, and very amenable to caching. This seems like a situation where some isolation testing to pinpoint the slowdown would be helpful, before we propose a solution.

saramsey avatar Aug 11 '21 19:08 saramsey

If need be, couldn't we get that information from KG2? I mean, all of Biolink's metamodel is in KG2.

saramsey avatar Aug 11 '21 19:08 saramsey

Reformatting the stack trace so it is more human-readable:

error_code: UncaughtARAXiError message: An uncaught error occurred: ('Connection
aborted.', RemoteDisconnected('Remote end closed connection without response')):
['Traceback (most recent call last):\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py",
line 677, in urlopen\n chunked=chunked,\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py",
line 426, in _make_request\n six.raise_from(e, None)\n', ' File "<string>", line
3, in raise_from\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py",
line 421, in _make_request\n httplib_response = conn.getresponse()\n', ' File
"/opt/python/3.7.6/lib/python3.7/http/client.py", line 1344, in getresponse\n
response.begin()\n', ' File "/opt/python/3.7.6/lib/python3.7/http/client.py",
line 306, in begin\n version, status, reason = self._read_status()\n', ' File
"/opt/python/3.7.6/lib/python3.7/http/client.py", line 275, in _read_status\n
raise RemoteDisconnected("Remote end closed connection without"\n',
'http.client.RemoteDisconnected: Remote end closed connection without
response\n', '\nDuring handling of the above exception, another exception
occurred:\n\n', 'Traceback (most recent call last):\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/adapters.py",
line 449, in send\n timeout=timeout\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py",
line 727, in urlopen\n method, url, error=e, _pool=self,
_stacktrace=sys.exc_info()[2]\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/util/retry.py",
line 410, in increment\n raise six.reraise(type(error), error, _stacktrace)\n',
' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/packages/six.py",
line 734, in reraise\n raise value.with_traceback(tb)\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py",
line 677, in urlopen\n chunked=chunked,\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py",
line 426, in _make_request\n six.raise_from(e, None)\n', ' File "<string>", line
3, in raise_from\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/urllib3/connectionpool.py",
line 421, in _make_request\n httplib_response = conn.getresponse()\n', ' File
"/opt/python/3.7.6/lib/python3.7/http/client.py", line 1344, in getresponse\n
response.begin()\n', ' File "/opt/python/3.7.6/lib/python3.7/http/client.py",
line 306, in begin\n version, status, reason = self._read_status()\n', ' File
"/opt/python/3.7.6/lib/python3.7/http/client.py", line 275, in _read_status\n
raise RemoteDisconnected("Remote end closed connection without"\n',
"urllib3.exceptions.ProtocolError: ('Connection aborted.',
RemoteDisconnected('Remote end closed connection without response'))\n",
'\nDuring handling of the above exception, another exception occurred:\n\n',
'Traceback (most recent call last):\n', ' File
"/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/ARAX_query.py", line
533, in execute_processing_plan\n overlay.apply(response,
action[\'parameters\'])\n', ' File
"/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/ARAX_overlay.py",
line 531, in apply\n \'action\'])() # thank you
https://stackoverflow.com/questions/11649848/call-methods-by-string\n', ' File
"/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/ARAX_overlay.py",
line 847, in __predict_drug_treats_disease\n response =
PDTD.predict_drug_treats_disease()\n', ' File
"/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/Overlay/predict_drug_treats_disease.py",
line 196, in predict_drug_treats_disease\n converted_target_curie =
self.convert_to_trained_curies(target_curie)\n', ' File
"/home/travis/build/RTXteam/RTX/code/ARAX/test/../ARAXQuery/Overlay/predict_drug_treats_disease.py",
line 139, in convert_to_trained_curies\n normalizer_result =
self.synonymizer.get_canonical_curies(curies=[input_curie],
return_all_categories=True)\n', ' File
"/home/travis/build/RTXteam/RTX/code/ARAX/ARAXQuery/../NodeSynonymizer/node_synonymizer.py",
line 1963, in get_canonical_curies\n results[entity][\'expanded_categories\'] =
category_manager.get_expansive_categories(row[4])\n', ' File
"/home/travis/build/RTXteam/RTX/code/ARAX/ARAXQuery/../NodeSynonymizer/category_manager.py",
line 133, in get_expansive_categories\n ancestors =
self.get_ancestors(category)\n', ' File
"/home/travis/build/RTXteam/RTX/code/ARAX/ARAXQuery/../NodeSynonymizer/category_manager.py",
line 86, in get_ancestors\n response_content = requests.get(url,
headers={\'accept\': \'application/json\'})\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/api.py",
line 76, in get\n return request(\'get\', url, params=params, **kwargs)\n', '
File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/api.py",
line 61, in request\n return session.request(method=method, url=url,
**kwargs)\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests_cache/core.py",
line 136, in request\n **kwargs\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/sessions.py",
line 530, in request\n resp = self.send(prep, **send_kwargs)\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests_cache/core.py",
line 109, in send\n return send_request_and_cache_response()\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests_cache/core.py",
line 97, in send_request_and_cache_response\n response = super(CachedSession,
self).send(request, **kwargs)\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/sessions.py",
line 643, in send\n r = adapter.send(request, **kwargs)\n', ' File
"/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/requests/adapters.py",
line 498, in send\n raise ConnectionError(err, request=request)\n',
"requests.exceptions.ConnectionError: ('Connection aborted.',
RemoteDisconnected('Remote end closed connection without response'))\n"]

saramsey avatar Aug 11 '21 19:08 saramsey

Potential issue is that the curie it uses is the general cancer node so possibly updating this to use a different curie might speed this up a lot.

finnagin avatar Aug 11 '21 20:08 finnagin

ok, as discussed at Wednesday's AHM, I added a centralized BiolinkHelper module (that uses a local copy of the Biolink model). (whether or not the SRI is causing the slow-down in this issue, we agreed at the AHM that we really should centralize Biolink lookup functionality anyway.)

I already plugged it into Expand, but I think DTD and the synonymizer and any other modules that use the CategoryManager can now switch over to the new BiolinkHelper. available methods are documented here: https://github.com/RTXteam/RTX/tree/master/code/ARAX/BiolinkHelper

let me know if anyone needs additional methods or tweaks!

amykglen avatar Aug 13 '21 19:08 amykglen

one thing to note is that it does not include ARAX's hard-coded "conflations" (e.g., protein == gene) at the moment. do we want it to contain those? wasn't sure if we should muddle our conflations with what's actually in Biolink.. but I suppose I could add an include_conflations parameter to get_descendants() and get_ancestors() that makes it controllable.

amykglen avatar Aug 13 '21 20:08 amykglen

I went ahead and added ARAX-defined conflations to the get_descendants() and get_ancestors() methods; conflations are now included by default, but you can control that behavior via the include_conflations parameter (README has been udpated)

amykglen avatar Aug 13 '21 22:08 amykglen

Awesome! Thanks @amykglen !

finnagin avatar Aug 13 '21 22:08 finnagin

Placing on future AHM

dkoslicki avatar Jun 05 '22 19:06 dkoslicki

6/22 AHM: ask Chunyu if he's already using this and if it can be closed

dkoslicki avatar Jun 22 '22 20:06 dkoslicki

Forgot to tag @chunyuma for the above question

dkoslicki avatar Jun 22 '22 20:06 dkoslicki

6/22 AHM: ask Chunyu if he's already using this and if it can be closed

@dkoslicki, if I understand correctly, this here means whether I used the centralized BiolinkHelper module that is developed by Amy in DTD, right? If so, the DTD is based on the synonymizer rather than CategoryManager. Therefore, it should not affect DTD.

chunyuma avatar Jun 23 '22 13:06 chunyuma

It seems like this issue has been solved. So, I close it.

chunyuma avatar Jun 17 '24 23:06 chunyuma