adv-r
adv-r copied to clipboard
A better explanation needed for section "evaluation" in 20.5.1 Quoting and unquoting
I understand this error is one of the most common ones when using tidyverse inside a function. But I'm unsure I really understand it. Sorry I used GPT to make the writing clearer (not necessarily correct).
Title: Clarification Required: Tidy Evaluation's Need for Explicit "Linkage" Between Expression and Data Mask
Background: The confusion arises from the behavior of eval_tidy when evaluating an expression in the context of a provided data mask (e.g., a dataframe). Why is there a need for an explicit "linkage" between the expression and the data mask, even when the dataframe is provided directly as an argument?
Scenario:
Given the functions:
subset2 <- function(data, rows) {
rows <- enquo(rows)
rows_val <- eval_tidy(rows, data)
stopifnot(is.logical(rows_val))
data[rows_val, , drop = FALSE]
}
subsample <- function(df, cond, n = nrow(df)) {
df <- subset2(df, cond)
resample(df, n)
}
When calling:
df <- data.frame(x = c(1, 1, 1, 2, 2), y = 1:5)
subsample(df, x == 1)
The error thrown is: Error in eval_tidy(rows, data): object 'x' not found.
Inferred Understanding from the Error:
- The error message indicates that the
rowsargument in thesubset2function does capture the expressionx == 1. - Inside
subset2, therowsvalue is directly associated with the expressionx == 1and its environment, rather than being an abstract placeholder.
Primary Concern:
- Why does
eval_tidy, insidesubset2, require a quosure (with embedded environment information) to evaluate therowsexpression correctly, even when the dataframe is provided directly? - Would traditional
evalexhibit similar behavior?
Explanation:
-
Lazy Evaluation in R: R employs lazy evaluation for function arguments. When
subsample(df, x == 1)is called, the expressionx == 1isn't evaluated right away. Instead, it is evaluated whencondis referenced within the function. -
Execution Inside
subset2and Role of Data in Quosure's Environment: Therows_val <- eval_tidy(rows, data)line is wherecond(sent asrows) is actually evaluated.-
Although the data frame (
data) is provided toeval_tidy, it doesn't set the evaluation environment forrowsby itself. Instead,eval_tidyreferences the environment contained within the quosure. The crucial insight here is that for the evaluation to be successful, the data (in this case, the dataframe) needs to be available within the environment of the quosure. -
The quosure captures both the expression and its associated environment. This is designed to ensure that the expression can be evaluated in the right context. The data mask is expected to be part of this environment or context. When
eval_tidyevaluates a quosure, it merges the data mask with the quosure's environment. Symbols in the expression are first looked up in the data mask, then in the quosure's environment, and then in parent environments. The absence of this linkage between the data and the quosure's environment causes the evaluation error.
-
-
Role of
enquo& Immediate Unquoting: Theenquofunction captures an expression and its surrounding environment into a quosure. This allows for a connection or "linkage" between the expression and its context. In oursubsamplefunction, the immediate unquoting!!pulls down thecondexpression directly into the function, ensuring that the expression and its context are evaluated together. This is crucial for the subsequentsubset2function to interpret and evaluate the expression in the right context witheval_tidy.
subsample <- function(df, cond, n = nrow(df)) {
cond <- enquo(cond)
df <- subset2(df, !!cond)
resample(df, n)
}
- Comparison with Traditional
eval: Using traditionaleval, the expression would be evaluated directly within the given environment. This approach doesn't rely on the embedded environment within a quosure, and thus, the behavior may differ fromeval_tidy.