astroquery icon indicating copy to clipboard operation
astroquery copied to clipboard

ENH: Cache refactoring

Open ceb8 opened this issue 5 years ago • 13 comments
trafficstars

This PR is a continuation of https://github.com/astropy/astroquery/pull/442 so I am not sure it should be it's own separate thing (@bsipocz @keflavich @jwoillez).

Work included:

  • Rebased https://github.com/astropy/astroquery/pull/442 onto current master
  • Cache timeout functionality (see https://github.com/astropy/astroquery/pull/442)
  • Added configurable cache location (defaults to current cache location)
  • Added option to turn off caching all-together (addresses https://github.com/astropy/astroquery/issues/1511)
  • Added support for cache_timeout = None which prevents the cache from ever timing out
  • Added clear_cache function that removes all cache pickle files from the current cache location (addresses https://github.com/astropy/astroquery/issues/1531)
  • Added reset_cache_preferences that resets cache location, timout, and enabled/disabled status to astroquery defaults
  • Moved cache directory creation (if necessary) into to_cache function, guarding against users who manually blow away their cache while in the middle of a session (addresses https://github.com/astropy/astroquery/issues/1625)
  • _request function cache argument defaults to "None," which means that the default cache preference will be used, setting to either True or False overrides the the global preference.
  • Reordered the local_filepath or order in _request, so that savedir overrides the global cache location if set (I believe this is the intended behavior)

Questions:

  • I kept the updates to astroquery.cfg added by https://github.com/astropy/astroquery/pull/442, but did not further update, because I don't understand how this file is used. @bsipocz @keflavich what is the purpose of this file?
  • suspend_chache function: I don't understand how this function works or is being used. It sets the cache active status to False, however as far as I can tell we only cache when the to_cache function is explicitly called, so it isn't clear to me that it is actually doing anything. I assume its presence in the login function is to make sure that login credentials aren't accidentally cached which is very important, so I don't want to mess that up. I changed the private class member _cache_active into the public member use_cache to allow users to turn caching on/off, but it is unclear to me if I can just change self.obj._cache_active to self.obj.use_cache in the suspend_cache function or if something more complicated is needed. @bsipocz @keflavich if you can clarify this, I will update suspend_cache properly.

Outstanding work:

  • Documentation on how to use the new cache functionality
  • Test cases for the new cache functionality

ceb8 avatar Feb 05 '20 14:02 ceb8

Hello @ceb8! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2022-09-12 15:47:26 UTC

pep8speaks avatar Feb 05 '20 14:02 pep8speaks

Codecov Report

Merging #1634 (016f6b4) into main (f74e7c0) will increase coverage by 0.10%. The diff coverage is 93.75%.

@@            Coverage Diff             @@
##             main    #1634      +/-   ##
==========================================
+ Coverage   63.97%   64.08%   +0.10%     
==========================================
  Files         132      131       -1     
  Lines       16984    16927      -57     
==========================================
- Hits        10866    10847      -19     
+ Misses       6118     6080      -38     
Impacted Files Coverage Δ
astroquery/query.py 62.40% <93.75%> (+4.03%) :arrow_up:
astroquery/utils/commons.py 76.71% <0.00%> (-1.68%) :arrow_down:
astroquery/conftest.py 78.78% <0.00%> (-1.22%) :arrow_down:
astroquery/alfalfa/core.py 95.31% <0.00%> (-0.69%) :arrow_down:
astroquery/utils/tap/xmlparser/jobSaxParser.py 96.87% <0.00%> (-0.04%) :arrow_down:
astroquery/noirlab/core.py
astroquery/ipac/irsa/irsa_dust/core.py 81.62% <0.00%> (+0.04%) :arrow_up:
astroquery/vizier/core.py 78.59% <0.00%> (+5.13%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Feb 05 '20 15:02 codecov[bot]

@ceb8 Great to see you picking up this topic!

I should probably have gotten back to you sooner on your suspend_cache question. This is a context manager that let you cleanly disable the cache for some blocks of operations. See an example there: https://github.com/astropy/astroquery/blob/52b490c10074d45f0ed84990fd6cfc07a833ccd4/astroquery/eso/core.py#L713

My understanding of astroquery.cfg is that this is now an auto-generated file. You do not have to keep it. The default_cache_timeout = 86400 will appear by itself on first/new use, in the user's astroquery.cfg. But I may be wrong...

jwoillez avatar Apr 17 '20 09:04 jwoillez

I had some old comments, but this now conflicts, so will come back to review once it's rebased.

bsipocz avatar Apr 22 '20 08:04 bsipocz

@ceb will this PR support MAST case, where the MAST service uses a long-running polling mechanism such that the raw http response from MAST is really just a status code, not the actual responses?

I'm trying to ascertain to see if this PR makes #1578 obsolete.

orionlee avatar Aug 25 '20 14:08 orionlee

@orionlee No it doesn't make that PR obsolete. Because the MAST DBs update so regularly as new data comes in all the time, I had wanted this underlying work done before implementing true caching on MAST to avoid problems with outdated results being returned forever, but it doesn't solve any of the issues inherent in the long polling nature of the mast servers.

ceb8 avatar Aug 25 '20 14:08 ceb8

@ceb Thanks for the clarification.

orionlee avatar Aug 26 '20 00:08 orionlee

What about looking into using one of the following?

  • https://github.com/reclosedev/requests-cache
  • https://github.com/ionrock/cachecontrol

jwoillez avatar Mar 03 '21 22:03 jwoillez

I would change this to a draft PR rather than dragging through more milestone changes. We can change back from draft whenever someone will have the time of wrapping it up. Thanks!

bsipocz avatar Jun 29 '21 23:06 bsipocz

@bsipocz I'm marking this ready for review. I haven't written the documentation yet, but the tests are in place, and I can do the docs concurrent with working on the code, so review at will.

ceb8 avatar Aug 03 '22 16:08 ceb8

I think using the configuration system to control caching is a good idea, but this pull request could make better use of it. Most importantly the configuration items should be accessed as late as possible, not stored in class attributes. If the cache_location class attribute is required for backwards compatibility then it should be implemented like in #2341. Properly using the configuration system would also allow its set_temp context manager to safely replace suspend_cache.

eerovaher avatar Aug 16 '22 20:08 eerovaher

OK, so let's see how this works together with #1578.

bsipocz avatar Sep 11 '22 08:09 bsipocz

@bsipocz I'm working on implementing @eerovaher's suggested changes. I'm almost there, and when I've got them in place I will integrate https://github.com/astropy/astroquery/pull/1578.

ceb8 avatar Sep 12 '22 09:09 ceb8

I'll plan to have a final look at this on the train on Monday, thanks for all the patience @ceb8.

bsipocz avatar Oct 28 '22 12:10 bsipocz

OK, I would say let's merge this now. If there is anything left, small follow-ups are always welcome.

Also, @ceb8, could you please go through the cache-related issues and close the ones that are not relevant anymore after the merge? I see none are linked for auto-closure, but I don't think it's worth keeping this open for that reason.

bsipocz avatar Nov 10 '22 19:11 bsipocz