knitr icon indicating copy to clipboard operation
knitr copied to clipboard

customizable cache (closes #2176)

Open atusy opened this issue 9 months ago • 5 comments

This PR allows implementing knit_cache_hook methods which may preprocess objects (e.g., save to an external file) and define custom loaders.

I will add a NEWS item after we agree with the design.

  • refactor(cache): use saveRDS/readRDS instead of makeLazyLoadDB/lazyload
    • For migration, cache_save replaces rdb/rdx files with rds file
    • For backward compatibility, cache_load() attempts lazyload() if rdb/rdx files are available
  • ~~feat(cache): allow pre/postprocessing cache objects)~~
    • knit_cache_preprocess preprocesses objects being saved
    • knit_cache_postprocess postprocesses objects being loaded
  • feat!(cache): implement knit_cache_hook instead of pre/post-processors
    • Call knit_cache_hook methods on saving cache
      • Methods may save extra files under ${cache_path(h)}__extra directory
      • Methods may return custom loader functions which is saved to ${cache_path(h).rds}

With this PR, we can add some hooks on objects to be cached. For example, we can use writeLines to save character objects.

```{r}
library(knitr)
registerS3method(
  "knit_cache_hook",
  "character",
  function(x, nm, path) {
    # Cache x as is if it extends character class
    if (!identical(class(x), "character")) {
      return(x)
    }

    # Preprocess data (e.g., save data to an external file)
    # Create external files under the directory of `paste0(path, "__extra")`
    # if knitr should cleanup them on refreshing/cleaning cache
    d <- paste0(path, "__extra")
    dir.create(d, showWarnings = FALSE, recursive = TRUE)
    f <- file.path(d, paste0(nm, '.txt'))
    writeLines(x, f)

    # Return loader function
    # which receives ellipsis for future extentions and has knit_cache_loader class
    structure(function(...) readLines(f), class = 'knit_cache_loader')
  },
  envir = asNamespace("knitr")
)
```

```{r, cache=TRUE}
x <- 'foo bar'
print(x)
```

```{r}
print(x)
```

atusy avatar Apr 26 '24 15:04 atusy

maybe preprocess and postprocess are not good names... :thinking:

atusy avatar Apr 26 '24 15:04 atusy

I got to fix tests

atusy avatar Apr 26 '24 15:04 atusy

To solve the above problems, I implemented the knit_cache_hook generic function in place of knit_cache_preprocess and knit_cache_postprocess. See updated description for the details.

atusy avatar Apr 30 '24 03:04 atusy

Thanks for the comment. I do not have a strong opinion, but let me leave some comments below.

I accepted the complexity for following reasons:

  • the feature is mainly for package developers and not for end users
  • usage is limited (I guess)

With my implementation, user's do not have to care about what is going on under saving/loading caches.

For developers, I agree chunk option is a good idea. The implementation becomes simple. However, this imposes end-users to understand tricks for edge-cases. Can we expect end-users read documents carefully before facing troubles on cache behavior?

atusy avatar Aug 27 '24 01:08 atusy

Good points, and I agree. Let me think more about it. Thanks!

yihui avatar Aug 27 '24 02:08 yihui