rmarkdown
rmarkdown copied to clipboard
df_print and cached chunks
Hi,
this gave me a few head-scratching moments: When I use html_document
that has been knitted and the results cached (I mean knitr::opts_chunk(cache = TRUE)
), then when I decide to show paged tables using
output:
html_document:
df_print: paged
in YAML header, the result keeps rendering as verbatim text output (forgive me the {shiny} lingo).
Now I regard this as obvious, but it is in fact the second time already that I have been solving this "issue". I believe it could be hard for {knitr} and {rmarkdown} to resolve the df_print
with a cached output since the usage of methods and classes that are inherent to the very cached output, but maybe it is worth documenting this behavior or raising some friendly warning. What do you think?
Session info
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000), RStudio 2021.9.1.372
Locale:
LC_COLLATE=Czech_Czechia.1250 LC_CTYPE=Czech_Czechia.1250 LC_MONETARY=Czech_Czechia.1250
LC_NUMERIC=C LC_TIME=Czech_Czechia.1250
Package version:
base64enc_0.1.3 digest_0.6.29 evaluate_0.14 fastmap_1.1.0 glue_1.6.0
graphics_4.1.2 grDevices_4.1.2 highr_0.9 htmltools_0.5.2 jquerylib_0.1.4
jsonlite_1.7.2 knitr_1.37 magrittr_2.0.1 methods_4.1.2 rlang_0.4.12
rmarkdown_2.11 stats_4.1.2 stringi_1.7.6 stringr_1.4.0 tinytex_0.36
tools_4.1.2 utils_4.1.2 xfun_0.29 yaml_2.2.1
Pandoc version: 2.14.0.3
Checklist
When filing a bug report, please check the boxes below to confirm that you have provided us with the information we need. Have you:
-
[x] formatted your issue so it is easier for us to read?
-
[x] included a minimal, self-contained, and reproducible example?
-
[x] pasted the output from
xfun::session_info('rmarkdown')
in your issue? -
[x] upgraded all your packages to their latest versions (including your versions of R, the RStudio IDE, and relevant R packages)?
-
[x] installed and tested your bug with the development version of the rmarkdown package using
remotes::install_github("rstudio/rmarkdown")
?
Thanksfor the suggestion.
We have some documentation and generic advices in the R Markdown Cookbook https://bookdown.org/yihui/rmarkdown-cookbook/cache.html
Among them:
The most appropriate use case of caching is to save and reload R objects that take too long to compute in a code chunk, and the code does not have any side effects, such as changing global R options via options() (such changes will not be cached). If a code chunk has side effects, we recommend that you do not cache it.
We do not recommend that you set the chunk option cache = TRUE globally in a document. Caching can be fairly tricky. Instead, we recommend that you enable caching only on individual code chunks that are surely time-consuming and do not have side effects.
Following this documentation, a Rmd that process data, and prints a table should be that way
---
title: "test"
output:
html_document:
df_print: paged
---
```{r, message=FALSE, warning=FALSE}
library(dplyr)
```
Le'ts get the droids name and their homeworld
```{r data, cache = TRUE}
droids <- starwars %>% filter(species == "Droid") %>% select(name, homeworld) %>% distinct()
```
```{r}
droids
```
Meaning that the table rendering / printing should not be in a cache chunk. That way the printing method (which is a side effect somehow) will correctly apply.
We could document specifically for df_print
, but really this will be the case with any external generic config (here changing df_print
YAML) that should apply on the output of a cached chunk. Caching means the chunk is not recomputed and result is loaded - changing an external config won't invalid the cache, unless it is explicitly set in cache.extra
option;
Anyway, I just wanted to clarify. I'll mark this as doc improvment - thanks for the suggestion !