astroquery
astroquery copied to clipboard
ENH: Cache refactoring
This PR is a continuation of https://github.com/astropy/astroquery/pull/442 so I am not sure it should be it's own separate thing (@bsipocz @keflavich @jwoillez).
Work included:
- Rebased https://github.com/astropy/astroquery/pull/442 onto current master
- Cache timeout functionality (see https://github.com/astropy/astroquery/pull/442)
- Added configurable cache location (defaults to current cache location)
- Added option to turn off caching all-together (addresses https://github.com/astropy/astroquery/issues/1511)
- Added support for
cache_timeout = Nonewhich prevents the cache from ever timing out - Added
clear_cachefunction that removes all cache pickle files from the current cache location (addresses https://github.com/astropy/astroquery/issues/1531) - Added
reset_cache_preferencesthat resets cache location, timout, and enabled/disabled status to astroquery defaults - Moved cache directory creation (if necessary) into
to_cachefunction, guarding against users who manually blow away their cache while in the middle of a session (addresses https://github.com/astropy/astroquery/issues/1625) _requestfunction cache argument defaults to "None," which means that the default cache preference will be used, setting to either True or False overrides the the global preference.- Reordered the local_filepath
ororder in_request, so that savedir overrides the global cache location if set (I believe this is the intended behavior)
Questions:
- I kept the updates to astroquery.cfg added by https://github.com/astropy/astroquery/pull/442, but did not further update, because I don't understand how this file is used. @bsipocz @keflavich what is the purpose of this file?
suspend_chachefunction: I don't understand how this function works or is being used. It sets the cache active status to False, however as far as I can tell we only cache when theto_cachefunction is explicitly called, so it isn't clear to me that it is actually doing anything. I assume its presence in the login function is to make sure that login credentials aren't accidentally cached which is very important, so I don't want to mess that up. I changed the private class member_cache_activeinto the public memberuse_cacheto allow users to turn caching on/off, but it is unclear to me if I can just changeself.obj._cache_activetoself.obj.use_cachein thesuspend_cachefunction or if something more complicated is needed. @bsipocz @keflavich if you can clarify this, I will updatesuspend_cacheproperly.
Outstanding work:
- Documentation on how to use the new cache functionality
- Test cases for the new cache functionality
Hello @ceb8! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2022-09-12 15:47:26 UTC
Codecov Report
Merging #1634 (016f6b4) into main (f74e7c0) will increase coverage by
0.10%. The diff coverage is93.75%.
@@ Coverage Diff @@
## main #1634 +/- ##
==========================================
+ Coverage 63.97% 64.08% +0.10%
==========================================
Files 132 131 -1
Lines 16984 16927 -57
==========================================
- Hits 10866 10847 -19
+ Misses 6118 6080 -38
| Impacted Files | Coverage Δ | |
|---|---|---|
| astroquery/query.py | 62.40% <93.75%> (+4.03%) |
:arrow_up: |
| astroquery/utils/commons.py | 76.71% <0.00%> (-1.68%) |
:arrow_down: |
| astroquery/conftest.py | 78.78% <0.00%> (-1.22%) |
:arrow_down: |
| astroquery/alfalfa/core.py | 95.31% <0.00%> (-0.69%) |
:arrow_down: |
| astroquery/utils/tap/xmlparser/jobSaxParser.py | 96.87% <0.00%> (-0.04%) |
:arrow_down: |
| astroquery/noirlab/core.py | ||
| astroquery/ipac/irsa/irsa_dust/core.py | 81.62% <0.00%> (+0.04%) |
:arrow_up: |
| astroquery/vizier/core.py | 78.59% <0.00%> (+5.13%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@ceb8 Great to see you picking up this topic!
I should probably have gotten back to you sooner on your suspend_cache question. This is a context manager that let you cleanly disable the cache for some blocks of operations. See an example there: https://github.com/astropy/astroquery/blob/52b490c10074d45f0ed84990fd6cfc07a833ccd4/astroquery/eso/core.py#L713
My understanding of astroquery.cfg is that this is now an auto-generated file. You do not have to keep it. The default_cache_timeout = 86400 will appear by itself on first/new use, in the user's astroquery.cfg. But I may be wrong...
I had some old comments, but this now conflicts, so will come back to review once it's rebased.
@ceb will this PR support MAST case, where the MAST service uses a long-running polling mechanism such that the raw http response from MAST is really just a status code, not the actual responses?
I'm trying to ascertain to see if this PR makes #1578 obsolete.
@orionlee No it doesn't make that PR obsolete. Because the MAST DBs update so regularly as new data comes in all the time, I had wanted this underlying work done before implementing true caching on MAST to avoid problems with outdated results being returned forever, but it doesn't solve any of the issues inherent in the long polling nature of the mast servers.
@ceb Thanks for the clarification.
What about looking into using one of the following?
- https://github.com/reclosedev/requests-cache
- https://github.com/ionrock/cachecontrol
I would change this to a draft PR rather than dragging through more milestone changes. We can change back from draft whenever someone will have the time of wrapping it up. Thanks!
@bsipocz I'm marking this ready for review. I haven't written the documentation yet, but the tests are in place, and I can do the docs concurrent with working on the code, so review at will.
I think using the configuration system to control caching is a good idea, but this pull request could make better use of it. Most importantly the configuration items should be accessed as late as possible, not stored in class attributes. If the cache_location class attribute is required for backwards compatibility then it should be implemented like in #2341. Properly using the configuration system would also allow its set_temp context manager to safely replace suspend_cache.
OK, so let's see how this works together with #1578.
@bsipocz I'm working on implementing @eerovaher's suggested changes. I'm almost there, and when I've got them in place I will integrate https://github.com/astropy/astroquery/pull/1578.
I'll plan to have a final look at this on the train on Monday, thanks for all the patience @ceb8.
OK, I would say let's merge this now. If there is anything left, small follow-ups are always welcome.
Also, @ceb8, could you please go through the cache-related issues and close the ones that are not relevant anymore after the merge? I see none are linked for auto-closure, but I don't think it's worth keeping this open for that reason.