R.cache icon indicating copy to clipboard operation
R.cache copied to clipboard

feature request: list cached calls

Open gezakiss7 opened this issue 10 years ago • 5 comments

It would be great if you could list the calls that have been cached, together with the file names; i.e. to store this information together with the cached values. This way you could examine what calculations took place during a larger set of operations. (By the way, I know the chance of collisions of the keys returned by digest is extremely small; but if they ever did, that would cause a huge problem. But if you wanted to, you could make 100% sure that no collisions happen using the information that you need to store for this request.)

gezakiss7 avatar Feb 03 '15 19:02 gezakiss7

Thanks for the suggestion. Would you mind elaborating a bit more, e.g. showing what a mockup command call with mockup output could look like.

About your second point; are you arguing that by including sys.call() information you would avoid clashes? It's not clear which function you are referring to, but I avoid this "magic" for saveCache() on purpose, because it's not unlikely that some of the arguments are non-informative from a memoization point of view. For instance, it may be that you don't want fit <- estimateAB(..., verbose=TRUE) and fit <- estimateAB(..., verbose=FALSE) to memoize based on argument `verbose', because that is only used for outputting messages during processing.

HenrikBengtsson avatar Feb 03 '15 22:02 HenrikBengtsson

Thanks for writing.

Elaborating on my suggestion: My wish is that there would be a function (e.g. listCached()) that returns a data structure with the list of all cached calls. It could be e.g. a two-column structure, with the fist column containing the call (of type call; e.g. fn(0, 'b')), and the second column could contain the path to the cache file (e.g. '~/home/.Rcache/f29066cbd18b128da4ddb068145e6ff9.Rcache').

My second point was not what you thought it was (sorry for not being clear; I did not mean storing the default parameters; and what you say regarding that makes sense). What I meant is that, as far as I know (correct me if I'm wrong), the md5 hash can be the same for different calls in extremely rare cases. Even though this should happen extremely rarely, when it does happen, it could cause serious problems. That is why I said that it may be worth checking that you get the cached value for the correct call.

Thanks!

gezakiss7 avatar Feb 14 '15 18:02 gezakiss7

I see.

Useful listing of cache files

So currently the best we can do is to manually list the cache directory, e.g.

> pathnames <- dir(path=getCacheRootPath(), pattern="[.]Rcache$", full.names=TRUE)
> basename(pathnames)
[1] "38006c86ce6f587042ca601e560b525d.Rcache"
[2] "4aea90fd5d45fd31c836b5846cfcf586.Rcache"
[3] "8015f8ba6d8f6a9bde3c994659e0563f.Rcache"
[4] "95a6de15853318c3fc0c7404e3ad947b.Rcache"

and possibly inspect the individual cache headers, e.g.

> readCacheHeader(pathnames[1])
$identifier
[1] "Rcache v0.1.7 (R package R.cache by Henrik Bengtsson)
$version
[1] "0.1.7"
$comment
[1] ""
$timestamp
[1] "2015-01-06 10:55:06 PST"

As you see, there is a comment field in file cache header, which can be utilized for things you are looking for (more below). So a prototype of what you're asking for could be:

> listCache()
checksum           size         modified             comment
38006c8...60b525d    2,272,093  2014-11-06 08:55:06  fn(0, 'b')
4aea90f...cfcf586  426,160,128  2014-11-16 10:35:01  fn(0, 'c')
8015f8b...9e0563f  193,319,085  2015-02-06 12:55:46  fn(1, 'b')
95a6de1...3ad947b      134,273  2014-12-13 10:01:30  fn(2, 'b')

Just like dir() this function should probably accept arguments such as recursive and full.names.

EDIT: Added file size column.

Recording call in cache header

The comment field in file cache header is an easy way for recording/storing the call behind the memoized/cached value. If done, the call have to be stored as a string, but I don't see the problem with that. I am not sure if makes sense to add a separate call field, because it may not be a call in all cases, e.g. an expression.

So, in this sense most of the machinery is already there for recording the call string/expressions, which I take from your suggestion should be useful visual clue on what the cache file contains. That's actually the original rational behind comment, but since introduced I never found myself using it so I almost forgot about it. As a start, one could have argument comment to default to deparse(sys.call()) for memoizedCall(). Analogously, it could default to deparse(expr) for evalWithMemoization().

Yes, it is extremely ^ extremely rare for two cache keys to clash that I wouldn't worry about that.

HenrikBengtsson avatar Feb 16 '15 19:02 HenrikBengtsson

That sounds great!

gezakiss7 avatar Feb 16 '15 19:02 gezakiss7

I tried using comment when I first started using the package, but like @HenrikBengtsson didn't find it too useful. But this function would have been really useful to me a few weeks ago when I forgot to add an important differentiator to the key. If the comment were autogenerated as the key, perhaps, and I could have listed all of the cached files out, I would have seen pretty quickly that I forgot to include something to designate different variables.

dougmitarotonda avatar Dec 06 '16 17:12 dougmitarotonda