knitr customizable cache (closes #2176)

This PR allows implementing knit_cache_hook methods which may preprocess objects (e.g., save to an external file) and define custom loaders.

I will add a NEWS item after we agree with the design.

refactor(cache): use saveRDS/readRDS instead of makeLazyLoadDB/lazyload
- For migration, cache_save replaces rdb/rdx files with rds file
- For backward compatibility, cache_load() attempts lazyload() if rdb/rdx files are available
~~feat(cache): allow pre/postprocessing cache objects)~~
- knit_cache_preprocess preprocesses objects being saved
- knit_cache_postprocess postprocesses objects being loaded
feat!(cache): implement knit_cache_hook instead of pre/post-processors
- Call knit_cache_hook methods on saving cache
  - Methods may save extra files under ${cache_path(h)}__extra directory
  - Methods may return custom loader functions which is saved to ${cache_path(h).rds}

With this PR, we can add some hooks on objects to be cached. For example, we can use writeLines to save character objects.

```{r}
library(knitr)
registerS3method(
  "knit_cache_hook",
  "character",
  function(x, nm, path) {
    # Cache x as is if it extends character class
    if (!identical(class(x), "character")) {
      return(x)
    }

    # Preprocess data (e.g., save data to an external file)
    # Create external files under the directory of `paste0(path, "__extra")`
    # if knitr should cleanup them on refreshing/cleaning cache
    d <- paste0(path, "__extra")
    dir.create(d, showWarnings = FALSE, recursive = TRUE)
    f <- file.path(d, paste0(nm, '.txt'))
    writeLines(x, f)

    # Return loader function
    # which receives ellipsis for future extentions and has knit_cache_loader class
    structure(function(...) readLines(f), class = 'knit_cache_loader')
  },
  envir = asNamespace("knitr")
)
```

```{r, cache=TRUE}
x <- 'foo bar'
print(x)
```

```{r}
print(x)
```

Apr 26 '24 15:04 atusy

maybe preprocess and postprocess are not good names... :thinking:

Apr 26 '24 15:04 atusy

I got to fix tests

Apr 26 '24 15:04 atusy

To solve the above problems, I implemented the knit_cache_hook generic function in place of knit_cache_preprocess and knit_cache_postprocess. See updated description for the details.

Apr 30 '24 03:04 atusy

Thanks for the comment. I do not have a strong opinion, but let me leave some comments below.

I accepted the complexity for following reasons:

the feature is mainly for package developers and not for end users
usage is limited (I guess)

With my implementation, user's do not have to care about what is going on under saving/loading caches.

For developers, I agree chunk option is a good idea. The implementation becomes simple. However, this imposes end-users to understand tricks for edge-cases. Can we expect end-users read documents carefully before facing troubles on cache behavior?

Aug 27 '24 01:08 atusy

Good points, and I agree. Let me think more about it. Thanks!

Aug 27 '24 02:08 yihui

knitr knitr copied to clipboard

customizable cache (closes #2176)

knitr
knitr copied to clipboard