osmdata icon indicating copy to clipboard operation
osmdata copied to clipboard

Cache calls

Open mpadge opened this issue 5 years ago • 0 comments

Instead of repeatedly bombing the overpass API, it'd be pretty easy to implement a local caching system that would record the call and store the pre-processed data returned from the API. Subsequent calls would then just re-load the local data and deliver anew.

The R.cache package has a hard-coded default that only allows enduring storage in "~/.Rcache/", used in the .onLoad call. This package sticks a few things in options(), but does not use any environmental variables.

A bit more flexibility could be added here via environmental variables, by defaulting to ~/.Rosmdata (or maybe piggyback on ~/.Rcache if it exists?), but allowing override if Sys.getenv("OSMDATA_CACHE_DIR") exists.

cache duration

Because OSM is constantly updated, it will be important to allow control over cache duration, so that local versions will be automatically updated at some stage. While this could also be handled via an environmental variable, "OSMDATA_CACHE_DURATION", that would need to be explicitly set by a user to work, so would impose additional burdens.

A less burdensome option would be an equivalent function parameter, which would best be placed in overpass_query(), because it's the overpass calls themselves that will actually be cached. Problem there is that that function is not exported. The general workflow is

opq() %>%
    add_osm_feature() %>%
    osmdata_sf/sp/sc/xml/pbf()

A cache_duration parameter could potentially be set in the initial opq() call, but that does not contain the full overpass query, and so this parameter would then need to be passed on to any and all subsequent functions. That suggests that the end-point calls are the best place for such a parameter. These currently only have 2 primary parameters (q, doc), so wouldn't suffer from an additional one there. If that is the point at which caching is determined, then it will likely be better to cache the full processed result, rather than just the direct result of the API call. The call itself could be digest-ed, while the cached object would be the final processed end-point. The timestamp could simply be read (file.info()$mtime), and the cache updated if difftime(...) > cache_duration, otherwise just re-load cached version.

mpadge avatar Nov 26 '18 09:11 mpadge