Consider adding Response caching for ARAX and/or KG2
consider adding Response caching.
- the biggest bang for the buck would be for ARAX. Less so for KG2. But KG2 could also benefit some
- there is already something called "ResponseCache" although it's more of a "response archive" than a "response cache"
- Since our code changes a lot, it seems prudent to clear the cache with every service restart, although something more clever could be contemplated
- Cache hits could be detected by taking the submitted query, removing the "callback" and "remote_address" which are transient, then serializing the remainder in a repeatable way, and then hashing it
- For ARAX, a cached Response could just point to the "response archive" and pull the Response from there, perhaps edit the "callback" and "remote_address" and send it back
- For KG2, responses are not current archived, so we couldn't use that. We would likely have to cache the Responses in a local cache that could be similar to the "response archive" but transient.
Interesting idea. So I am wondering, how frequent are such repeated queries of the same TRAPI graph, that are not "testing if ARAX is working"? If it isn't a high proportion of our non-testing-related queries, then that might limit the positive benefit. On the other hand, it may not be that hard to implement. Are you thinking of an in-memory cache? That could pose very interesting/thorny threading issues. Or perhaps you were thinking of a SQLite cache on the local EBS volume; that would presumably be much faster than retrieving from the S3 bucket. And the cache would not be cross-service-instance, right? i.e., it would just be local to the specific ARAX service?
[For the "testing if ARAX is working" use-case, I presume that achieving sub-second response times is not really a high priority; that's why I phrased my question in terms of use-cases outside of testing-for-ARAX-not-being-broken].
Interesting idea. So I am wondering, how frequent are such repeated queries of the same TRAPI graph, that are not "testing if ARAX is working"?
I don't have enough data to know.
If it isn't a high proportion of our non-testing-related queries, then that might limit the positive benefit.
True.
On the other hand, it may not be that hard to implement. Are you thinking of an in-memory cache? That could pose very interesting/thorny threading issues.
No, because of all the forking involved, in-memory would not work I think.
Or perhaps you were thinking of a SQLite cache on the local EBS volume; that would presumably be much faster than retrieving from the S3 bucket. And the cache would not be cross-service-instance, right? i.e., it would just be local to the specific ARAX service?
Exactly, yes to all.
[For the "testing if ARAX is working" use-case, I presume that achieving sub-second response times is not really a high priority; that's why I phrased my question in terms of use-cases outside of testing-for-ARAX-not-being-broken].
yes, ideally we would ensure that our "bypass_cache" option was working correctly and the watchdog would use the "bypass_cache" option, which others would not.