purrr icon indicating copy to clipboard operation
purrr copied to clipboard

`map`ing rmarkdown::render with data.table

Open medewitt opened this issue 4 years ago • 8 comments

I have encountered a strange bug that I believe originates with some kind of scoping issue between data.table and purrr. When I try to purrr::map a vector of paths of Rmd files to the rmarkdown::render function, the rendering fails.

One such error is as below:

Error: `:=` can only be used within a quasiquoted argument

Reprex:

  • an Rmd document
  • an R script that renders the Rmd

---Rmarkdown contents saved as test.Rmd---

---
title: "Untitled"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r}
library(dplyr)
library(data.table)
.datatable.aware = TRUE

dat <- data.table(x = rnorm(100),
                  y = rnorm(100),
                  grouper = sample(letters, 100, replace = TRUE))

dat[ ,z:=x+y]

```

In a separate script I do the following:


x <- "test.Rmd"

# Fails---
purrr::map(x, rmarkdown::render)
#Quitting from lines 13-23 (dattableconflict.Rmd) 
#Error: `:=` can only be used within a quasiquoted argument

# Sucessful--
rmarkdown::render(x)

I tried all of the solutions proposed in this initial post. Given that the documents knits when render is called directly seems to indicate that it might be a purrr bug. Additionally this behaviour does not seem to be isolated to :=, but also some of the NSE within data.table as well. For instance:

---
title: "Untitled"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r}
library(dplyr)
library(data.table)
.datatable.aware = TRUE

dat <- data.table(x = rnorm(100),
                  y = rnorm(100),
                  grouper = sample(letters, 100, replace = TRUE),
                  date = seq.Date(Sys.Date(), length.out = 100, by = 1))

dat[,.SD[which.min(date)], by = grouper]

```

With:

x <- "test.Rmd"

purrr::map(x, rmarkdown::render)
# Error in which.min(date) : 
#  cannot coerce type 'closure' to vector of type 'double'

medewitt avatar Aug 19 '20 19:08 medewitt

Seems like a conflict between := operator between the one from rlang used by tidyverse ecosystem and the one from data.table.

A workaround for your usage would be to render your rmarkdown using a new empty environment to evaluate your code chunk

x <- "test.Rmd"

purrr::map(x, ~ rmarkdown::render(.x, envir = new.env()))

or render in a clean new R process

x <- "test.Rmd"

purrr::map(x, ~ {
  callr::r_safe(
    function(...) rmarkdown::render(...), 
    args = list(input = .x),
    show = TRUE, spinner = FALSE
  )
})

For the issue, I wonder is there is not something with evaluate that does not correctly handles the evaluation of what is inside the j part of data.table[i, j, by] syntax. The error in your second example lead me to this because it clearly evaluate with date the function and not the data.table column. Setting envir = in render to new.env() or calling callr solve also the issue.

hope it helps.

cderv avatar Aug 20 '20 08:08 cderv

Ok so I tried to look quickly into what is discussed in https://github.com/rstudio/rmarkdown/issues/187#issuecomment-52332667, and it seems it is the same king of issue but with purrr.

First, I looked at more verbosity by modifing test.Rmd :

  • Do not stop on error with error = TRUE
  • Activate data.table verbosity with options(datatable.verbose=TRUE)
---
title: "Untitled"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error = TRUE)
options(datatable.verbose=TRUE)
```

```{r}
library(dplyr)
library(data.table)
.datatable.aware = TRUE

dat <- data.table(x = rnorm(100),
                  y = rnorm(100),
                  grouper = sample(letters, 100, replace = TRUE))

dat[ ,z:=x+y]
```

Then running

x <- "test.Rmd"
purrr::map(x, rmarkdown::render)

will not fail and you can see more information in the html output. Among them

## cedta decided 'purrr' wasn't data.table aware.
Full trace
## cedta decided 'purrr' wasn't data.table aware. Here is call stack with [[1L]] applied:
## [[1]]
## purrr::map
## 
## [[2]]
## .f
## 
## [[3]]
## knitr::knit
## 
## [[4]]
## process_file
## 
## [[5]]
## withCallingHandlers
## 
## [[6]]
## process_group
## 
## [[7]]
## process_group.block
## 
## [[8]]
## call_block
## 
## [[9]]
## block_exec
## 
## [[10]]
## in_dir
## 
## [[11]]
## evaluate
## 
## [[12]]
## evaluate::evaluate
## 
## [[13]]
## evaluate_call
## 
## [[14]]
## timing_fn
## 
## [[15]]
## handle
## 
## [[16]]
## try
## 
## [[17]]
## tryCatch
## 
## [[18]]
## tryCatchList
## 
## [[19]]
## tryCatchOne
## 
## [[20]]
## doTryCatch
## 
## [[21]]
## withCallingHandlers
## 
## [[22]]
## withVisible
## 
## [[23]]
## eval
## 
## [[24]]
## eval
## 
## [[25]]
## `[`
## 
## [[26]]
## `[.data.table`
## 
## [[27]]
## cedta

I guess that when keeping the default envir in render, it will use parent.frame(). As it is evaluated in purrr::map, the namespace name that runs the lines is purrr I think, so based on matt's comment https://github.com/rstudio/rmarkdown/issues/187#issuecomment-52332667, I tried this on the original Rmd file (without error = TRUE)

x <- "test.Rmd"

assignInNamespace("cedta.override", c(data.table:::cedta.override,"purrr"), "data.table")
purrr::map(x, rmarkdown::render)

and it worked.

So from matt's comment, there may be something to do in purrr so that it is aware of data.table, or purrr could be whitelisted in data.table.

cderv avatar Aug 20 '20 08:08 cderv

Thanks @cderv ! Also thanks for the debugging tip as well!

medewitt avatar Aug 20 '20 10:08 medewitt

TIL that data.table implements a variant of lexical scoping of data.table methods. When called from a namespace, it checks whether that namespace imports data.table or whether it contains a .datatable.aware flag. When that is not the case, the data.table methods delegate to data.frame instead.

lionel- avatar Aug 20 '20 11:08 lionel-

This is a typical issue of lexical scoping. The recommended workaround for this sort of issues is to forward the lexical scope with an anonymous function. That function will be created in, and inherit from, the user environment. Then rmarkdown::render() will evaluate in that environment. I think this should work (untested):

purrr::map(x, ~ rmarkdown::render(.x))

lionel- avatar Aug 20 '20 11:08 lionel-

Added a PR to add purrr to the data.table whitelist in the meantime.

Thanks for your support!

medewitt avatar Aug 20 '20 12:08 medewitt

I don't think you should have sent this PR. It seems like it's a decision for the purrr maintainers to make.

lionel- avatar Aug 20 '20 12:08 lionel-

I think this should work (untested)

I did test and yes this works with the original test.Rmd

x <- "test.Rmd"
purrr::map(x, ~ rmarkdown::render(.x))

cderv avatar Aug 20 '20 12:08 cderv

I don't think there's anything for purrr to do here. It seems like an unfortunate interaction of purrr, rmarkdown, and data.table behaviours that all make sense in isolation.

hadley avatar Aug 24 '22 11:08 hadley