R.cache
R.cache copied to clipboard
feature request: list cached calls
It would be great if you could list the calls that have been cached, together with the file names; i.e. to store this information together with the cached values. This way you could examine what calculations took place during a larger set of operations. (By the way, I know the chance of collisions of the keys returned by digest is extremely small; but if they ever did, that would cause a huge problem. But if you wanted to, you could make 100% sure that no collisions happen using the information that you need to store for this request.)
Thanks for the suggestion. Would you mind elaborating a bit more, e.g. showing what a mockup command call with mockup output could look like.
About your second point; are you arguing that by including sys.call()
information you would avoid clashes? It's not clear which function you are referring to, but I avoid this "magic" for saveCache()
on purpose, because it's not unlikely that some of the arguments are non-informative from a memoization point of view. For instance, it may be that you don't want fit <- estimateAB(..., verbose=TRUE)
and fit <- estimateAB(..., verbose=FALSE)
to memoize based on argument `verbose', because that is only used for outputting messages during processing.
Thanks for writing.
Elaborating on my suggestion: My wish is that there would be a function (e.g. listCached()) that returns a data structure with the list of all cached calls. It could be e.g. a two-column structure, with the fist column containing the call (of type call; e.g. fn(0, 'b')), and the second column could contain the path to the cache file (e.g. '~/home/.Rcache/f29066cbd18b128da4ddb068145e6ff9.Rcache').
My second point was not what you thought it was (sorry for not being clear; I did not mean storing the default parameters; and what you say regarding that makes sense). What I meant is that, as far as I know (correct me if I'm wrong), the md5 hash can be the same for different calls in extremely rare cases. Even though this should happen extremely rarely, when it does happen, it could cause serious problems. That is why I said that it may be worth checking that you get the cached value for the correct call.
Thanks!
I see.
Useful listing of cache files
So currently the best we can do is to manually list the cache directory, e.g.
> pathnames <- dir(path=getCacheRootPath(), pattern="[.]Rcache$", full.names=TRUE)
> basename(pathnames)
[1] "38006c86ce6f587042ca601e560b525d.Rcache"
[2] "4aea90fd5d45fd31c836b5846cfcf586.Rcache"
[3] "8015f8ba6d8f6a9bde3c994659e0563f.Rcache"
[4] "95a6de15853318c3fc0c7404e3ad947b.Rcache"
and possibly inspect the individual cache headers, e.g.
> readCacheHeader(pathnames[1])
$identifier
[1] "Rcache v0.1.7 (R package R.cache by Henrik Bengtsson)
$version
[1] "0.1.7"
$comment
[1] ""
$timestamp
[1] "2015-01-06 10:55:06 PST"
As you see, there is a comment
field in file cache header, which can be utilized for things you are looking for (more below). So a prototype of what you're asking for could be:
> listCache()
checksum size modified comment
38006c8...60b525d 2,272,093 2014-11-06 08:55:06 fn(0, 'b')
4aea90f...cfcf586 426,160,128 2014-11-16 10:35:01 fn(0, 'c')
8015f8b...9e0563f 193,319,085 2015-02-06 12:55:46 fn(1, 'b')
95a6de1...3ad947b 134,273 2014-12-13 10:01:30 fn(2, 'b')
Just like dir()
this function should probably accept arguments such as recursive
and full.names
.
EDIT: Added file size column.
Recording call in cache header
The comment
field in file cache header is an easy way for recording/storing the call behind the memoized/cached value. If done, the call have to be stored as a string, but I don't see the problem with that. I am not sure if makes sense to add a separate call
field, because it may not be a call in all cases, e.g. an expression.
So, in this sense most of the machinery is already there for recording the call string/expressions, which I take from your suggestion should be useful visual clue on what the cache file contains. That's actually the original rational behind comment
, but since introduced I never found myself using it so I almost forgot about it. As a start, one could have argument comment
to default to deparse(sys.call())
for memoizedCall()
. Analogously, it could default to deparse(expr)
for evalWithMemoization()
.
Yes, it is extremely ^ extremely rare for two cache keys to clash that I wouldn't worry about that.
That sounds great!
I tried using comment
when I first started using the package, but like @HenrikBengtsson didn't find it too useful. But this function would have been really useful to me a few weeks ago when I forgot to add an important differentiator to the key
. If the comment
were autogenerated as the key
, perhaps, and I could have listed all of the cached files out, I would have seen pretty quickly that I forgot to include something to designate different variables.