searvey icon indicating copy to clipboard operation
searvey copied to clipboard

Refresh cached metadata

Open pmav99 opened this issue 2 years ago • 4 comments

In IOC, COOPS and USGS we are caching the retrieved metadata. This is really useful for e.g. running the tests, but it can be problematic for long running processes (in the range of days/weeks/months). The first call will cache the metadata and, currently, there is no easy way to update the metadata.

I was thinking that we should add an extra argument in get_*_stations() functions similar to refresh_cache: bool = False. This way we will keep the existing behavior, and if someone needs to refresh the cache, they will be able to do so.

As far as the actual implementation goes, we would need something like this: https://stackoverflow.com/a/37654201/592289

pinging @brey @SorooshMani-NOAA

pmav99 avatar Jul 24 '23 09:07 pmav99

I like the idea. We need to establish a threshold for the refresh to kick in. Ideally, this should be internal and not visible to the user, although a warning/info comment might be required for transparency.

Maybe we need also to document how the users should achieve persistence in the usage of searvey if that is required.

brey avatar Jul 24 '23 12:07 brey

Having the ability to reset helps. Ideally this should be available as an automatic operation (e.g. per day/hour/etc.) for non-developer users and as manual ability to reset for others. We already know that calling cache_clear can be used for the manual part, but for automatic this is an interesting idea: https://stackoverflow.com/questions/31771286/python-in-memory-cache-with-time-to-live

There's also this package: https://cachetools.readthedocs.io/en/latest/ Although maybe let's think twice before adding more dependencies

SorooshMani-NOAA avatar Jul 24 '23 12:07 SorooshMani-NOAA

WRT to persisting searvey's metadata, we are using standard (geo)pandas, therefore I don't think we need to provide a specific API for this. Adding a note in the docs and/or example in the notebooks wouldn't necessarily be a bad idea though.

I didin't think of automatically invalidating the cache after some time, but I agree it is a good idea, and that SO answer seems to provide a rather elegant way of doing so without introducing any 3rd party dependencies. WRT to adding a runtime warning I am -1 to be honest. For sure we should document it but a warning each time you call a functions seems to be too much. Moreover 3 warning when you call searvey.get_stations() etc...

pmav99 avatar Jul 24 '23 14:07 pmav99