memoise icon indicating copy to clipboard operation
memoise copied to clipboard

Support for storr caches?

Open wlandau opened this issue 7 years ago • 6 comments

I love @richfitz's storr API, and it opens up all sorts of storage backends without requiring extra work from memoise itself.

wlandau avatar Aug 21 '18 12:08 wlandau

Couldn't that go into a separate package? But happy to implement it whereever sensible.

dirkschumacher avatar May 12 '20 16:05 dirkschumacher

This proof of concept seems to work:

library(memoise)
library(storr)
library(cachem)
library(rlang)

st <- storr_environment()

wrap_storr_cache <- function(st, missing = cachem::key_missing(), namespace = st$default_namespace) {
  missing_ <- enquo(missing)
  structure(
    list(
      get = function(key, missing = missing_) {
        tryCatch(
          st$get(key, namespace = namespace),
          error = function(e) eval_tidy(as_quosure(missing))
        )
      },
      set = function(key, value) st$set(key, value, namespace = namespace),
      exists = function(key) st$exists(key, namespace = namespace),
      remove = function(key) st$del(key, namespace = namespace),
      reset = function() st$clear(namespace = namespace),
      keys = function() st$list(namespace = namespace),
      prune = st$gc, # gc is not namespaced
      size = function() st$list(namespace = namespace),
      ## Hack for debugging: keep a reference to the storr object
      ## itself as well
      .st = st
    ),
    ## Not sure if this is correct, but it makes it print nicely
    class = c("cachem")
  )
}

st_cache <- wrap_storr_cache(st)

mysqrt <- function(x) {
  message("Computing sqrt of ", deparse1(x))
  sqrt(x)
}

mysqrt_memo <- memoise(mysqrt, cache = st_cache)

mysqrt_memo(1:10)
mysqrt_memo(1:10)
mysqrt_memo(1:10)

DarwinAwardWinner avatar Apr 26 '21 16:04 DarwinAwardWinner

That seems like a reasonable implementation. I don't know which package it should live in, though.

A few other notes:

  • I think the tryCatch() should catch KeyError rather than the more general error (though I could be wrong in my understanding of how storr works).
  • If this function were to go in storr, that package could avoid taking a dependency on cachem. Instead of calling cachem::key_missing(), it could simply return an object that has the same structure.
  • With disk cache that is implemented in the cachem package, the get() method is atomic -- it doesn't do the check for the key's existence and the fetching the value in two separate steps. This is important for avoiding race conditions. I don't know if storr's object stores work the same way, but it is best if they do. Otherwise, if multiple R processes are sharing the same object store, there can be race conditions where after the existence check and before value fetch, an object is deleted from the store.

wch avatar Apr 26 '21 19:04 wch

I don't know about the rds backend, but I believe the more interesting storr backends all use databases that provide proper atomicity guarantees, even under concurrent access from multiple R processes.

DarwinAwardWinner avatar Apr 26 '21 19:04 DarwinAwardWinner

Oh, it looks like storr specifically allows for the possibility of a key being stored but the corresponding data being deleted from the object store: in this case, it returns HashError as described here: https://richfitz.github.io/storr/articles/storr.html#classed-exceptions

DarwinAwardWinner avatar Apr 26 '21 19:04 DarwinAwardWinner

As for where this code should live, perhaps it should be added to cachem, along with additional code to implement the appropriate cache size/age expiry logic.

DarwinAwardWinner avatar Apr 26 '21 20:04 DarwinAwardWinner