purrr
purrr copied to clipboard
`map`ing rmarkdown::render with data.table
I have encountered a strange bug that I believe originates with some kind of scoping issue between data.table and purrr. When I try to purrr::map a vector of paths of Rmd files to the rmarkdown::render function, the rendering fails.
One such error is as below:
Error: `:=` can only be used within a quasiquoted argument
Reprex:
- an Rmd document
- an R script that renders the Rmd
---Rmarkdown contents saved as test.Rmd---
---
title: "Untitled"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library(dplyr)
library(data.table)
.datatable.aware = TRUE
dat <- data.table(x = rnorm(100),
y = rnorm(100),
grouper = sample(letters, 100, replace = TRUE))
dat[ ,z:=x+y]
```
In a separate script I do the following:
x <- "test.Rmd"
# Fails---
purrr::map(x, rmarkdown::render)
#Quitting from lines 13-23 (dattableconflict.Rmd)
#Error: `:=` can only be used within a quasiquoted argument
# Sucessful--
rmarkdown::render(x)
I tried all of the solutions proposed in this initial post. Given that the documents knits when render is called directly seems to indicate that it might be a purrr bug. Additionally this behaviour does not seem to be isolated to :=
, but also some of the NSE within data.table as well. For instance:
---
title: "Untitled"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library(dplyr)
library(data.table)
.datatable.aware = TRUE
dat <- data.table(x = rnorm(100),
y = rnorm(100),
grouper = sample(letters, 100, replace = TRUE),
date = seq.Date(Sys.Date(), length.out = 100, by = 1))
dat[,.SD[which.min(date)], by = grouper]
```
With:
x <- "test.Rmd"
purrr::map(x, rmarkdown::render)
# Error in which.min(date) :
# cannot coerce type 'closure' to vector of type 'double'
Seems like a conflict between :=
operator between the one from rlang
used by tidyverse ecosystem and the one from data.table
.
A workaround for your usage would be to render your rmarkdown using a new empty environment to evaluate your code chunk
x <- "test.Rmd"
purrr::map(x, ~ rmarkdown::render(.x, envir = new.env()))
or render in a clean new R process
x <- "test.Rmd"
purrr::map(x, ~ {
callr::r_safe(
function(...) rmarkdown::render(...),
args = list(input = .x),
show = TRUE, spinner = FALSE
)
})
For the issue, I wonder is there is not something with evaluate
that does not correctly handles the evaluation of what is inside the j
part of data.table[i, j, by]
syntax. The error in your second example lead me to this because it clearly evaluate with date
the function and not the data.table column. Setting envir =
in render
to new.env()
or calling callr
solve also the issue.
hope it helps.
Ok so I tried to look quickly into what is discussed in https://github.com/rstudio/rmarkdown/issues/187#issuecomment-52332667, and it seems it is the same king of issue but with purrr.
First, I looked at more verbosity by modifing test.Rmd
:
- Do not stop on error with
error = TRUE
- Activate data.table verbosity with
options(datatable.verbose=TRUE)
---
title: "Untitled"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error = TRUE)
options(datatable.verbose=TRUE)
```
```{r}
library(dplyr)
library(data.table)
.datatable.aware = TRUE
dat <- data.table(x = rnorm(100),
y = rnorm(100),
grouper = sample(letters, 100, replace = TRUE))
dat[ ,z:=x+y]
```
Then running
x <- "test.Rmd"
purrr::map(x, rmarkdown::render)
will not fail and you can see more information in the html output. Among them
## cedta decided 'purrr' wasn't data.table aware.
Full trace
## cedta decided 'purrr' wasn't data.table aware. Here is call stack with [[1L]] applied:
## [[1]]
## purrr::map
##
## [[2]]
## .f
##
## [[3]]
## knitr::knit
##
## [[4]]
## process_file
##
## [[5]]
## withCallingHandlers
##
## [[6]]
## process_group
##
## [[7]]
## process_group.block
##
## [[8]]
## call_block
##
## [[9]]
## block_exec
##
## [[10]]
## in_dir
##
## [[11]]
## evaluate
##
## [[12]]
## evaluate::evaluate
##
## [[13]]
## evaluate_call
##
## [[14]]
## timing_fn
##
## [[15]]
## handle
##
## [[16]]
## try
##
## [[17]]
## tryCatch
##
## [[18]]
## tryCatchList
##
## [[19]]
## tryCatchOne
##
## [[20]]
## doTryCatch
##
## [[21]]
## withCallingHandlers
##
## [[22]]
## withVisible
##
## [[23]]
## eval
##
## [[24]]
## eval
##
## [[25]]
## `[`
##
## [[26]]
## `[.data.table`
##
## [[27]]
## cedta
I guess that when keeping the default envir
in render
, it will use parent.frame()
. As it is evaluated in purrr::map
, the namespace name that runs the lines is purrr
I think, so based on matt's comment https://github.com/rstudio/rmarkdown/issues/187#issuecomment-52332667, I tried this on the original Rmd file (without error = TRUE
)
x <- "test.Rmd"
assignInNamespace("cedta.override", c(data.table:::cedta.override,"purrr"), "data.table")
purrr::map(x, rmarkdown::render)
and it worked.
So from matt's comment, there may be something to do in purrr so that it is aware of data.table
, or purrr
could be whitelisted in data.table
.
Thanks @cderv ! Also thanks for the debugging tip as well!
TIL that data.table implements a variant of lexical scoping of data.table methods. When called from a namespace, it checks whether that namespace imports data.table or whether it contains a .datatable.aware
flag. When that is not the case, the data.table methods delegate to data.frame instead.
This is a typical issue of lexical scoping. The recommended workaround for this sort of issues is to forward the lexical scope with an anonymous function. That function will be created in, and inherit from, the user environment. Then rmarkdown::render()
will evaluate in that environment. I think this should work (untested):
purrr::map(x, ~ rmarkdown::render(.x))
I don't think you should have sent this PR. It seems like it's a decision for the purrr maintainers to make.
I think this should work (untested)
I did test and yes this works with the original test.Rmd
x <- "test.Rmd"
purrr::map(x, ~ rmarkdown::render(.x))
I don't think there's anything for purrr to do here. It seems like an unfortunate interaction of purrr, rmarkdown, and data.table behaviours that all make sense in isolation.