Support for storr caches?
I love @richfitz's storr API, and it opens up all sorts of storage backends without requiring extra work from memoise itself.
Couldn't that go into a separate package? But happy to implement it whereever sensible.
This proof of concept seems to work:
library(memoise)
library(storr)
library(cachem)
library(rlang)
st <- storr_environment()
wrap_storr_cache <- function(st, missing = cachem::key_missing(), namespace = st$default_namespace) {
missing_ <- enquo(missing)
structure(
list(
get = function(key, missing = missing_) {
tryCatch(
st$get(key, namespace = namespace),
error = function(e) eval_tidy(as_quosure(missing))
)
},
set = function(key, value) st$set(key, value, namespace = namespace),
exists = function(key) st$exists(key, namespace = namespace),
remove = function(key) st$del(key, namespace = namespace),
reset = function() st$clear(namespace = namespace),
keys = function() st$list(namespace = namespace),
prune = st$gc, # gc is not namespaced
size = function() st$list(namespace = namespace),
## Hack for debugging: keep a reference to the storr object
## itself as well
.st = st
),
## Not sure if this is correct, but it makes it print nicely
class = c("cachem")
)
}
st_cache <- wrap_storr_cache(st)
mysqrt <- function(x) {
message("Computing sqrt of ", deparse1(x))
sqrt(x)
}
mysqrt_memo <- memoise(mysqrt, cache = st_cache)
mysqrt_memo(1:10)
mysqrt_memo(1:10)
mysqrt_memo(1:10)
That seems like a reasonable implementation. I don't know which package it should live in, though.
A few other notes:
- I think the
tryCatch()should catchKeyErrorrather than the more generalerror(though I could be wrong in my understanding of how storr works). - If this function were to go in storr, that package could avoid taking a dependency on cachem. Instead of calling
cachem::key_missing(), it could simply return an object that has the same structure. - With disk cache that is implemented in the cachem package, the
get()method is atomic -- it doesn't do the check for the key's existence and the fetching the value in two separate steps. This is important for avoiding race conditions. I don't know if storr's object stores work the same way, but it is best if they do. Otherwise, if multiple R processes are sharing the same object store, there can be race conditions where after the existence check and before value fetch, an object is deleted from the store.
I don't know about the rds backend, but I believe the more interesting storr backends all use databases that provide proper atomicity guarantees, even under concurrent access from multiple R processes.
Oh, it looks like storr specifically allows for the possibility of a key being stored but the corresponding data being deleted from the object store: in this case, it returns HashError as described here: https://richfitz.github.io/storr/articles/storr.html#classed-exceptions
As for where this code should live, perhaps it should be added to cachem, along with additional code to implement the appropriate cache size/age expiry logic.