rlang icon indicating copy to clipboard operation
rlang copied to clipboard

More robust data mask rechaining

Open lionel- opened this issue 3 years ago • 1 comments

Currently the lexical scope of environments that escape data masking contexts is rather undefined. These environments consistently include the masked data but beyond that it depends which quosure was first evaluated:

e1 <- NULL
q1 <- local({
  foo <- 10
  e1 <<- environment()
  quo(environment())
})

e2 <- NULL
q2 <- local({
  foo <- "wrong"
  e2 <<- environment()
  quo(identity(!!q1))
})

out <- eval_tidy(q2)

evalq(foo, out)
#> [1] "wrong"

identical(env_parent(out), e2)
#> [1] TRUE

That's because of the way data masks are rechained to the quosure env as evaluation progresses from one quosure to the next.

This strategy causes pain for some metaprogramming tasks such as quosure expansion in dplyr::across() (cc @romainfrancois), making it difficult to create quosures inside data masks. We are now at our third patch release of dplyr to deal with edge cases related to this.

An alternative rechaining strategy that would be more predictable and easier to work with would be as follows:

  • Create a new execution environment before evaluating each quosure.
  • Set the parent of that exec env to the data mask. Set the data mask top to the quosure env.
  • After evaluation, set the parent of the exec env back to the quosure env. Set the data mask top to the empty env.

With this design all environments created within data masks would consistently inherit from their original lexical environment. On the other hand, they would no longer inherit from the data mask, which is a potentially important breaking change. We would need to assess damage for tidyverse and tidymodels reverse dependencies (cc @topepo @DavisVaughan).

lionel- avatar Jun 30 '21 12:06 lionel-

dplyr data masks are marked stale after evaluation so removing the mask from the scope would converge towards dplyr behaviour:

library(dplyr)

out <- mtcars %>%
  transmute(fit = list(lm(disp ~ drat)))

fit <- out$fit[[1]]
e <- environment(fit$terms)

rlang::env_print(rlang::env_parent(e))
#> <environment: 0x7f93126f10f8>
#> Parent: <environment: 0x7f9322012090>
#> Bindings:
#> • gear: <lazy>
#> • drat: <lazy>
#> • wt: <lazy>
#> • fit: <lazy>
#> • mpg: <lazy>
#> • hp: <lazy>
#> • cyl: <lazy>
#> • am: <lazy>
#> • qsec: <lazy>
#> • vs: <lazy>
#> • carb: <lazy>

rlang::env_parent(e)$cyl
#> Error: Obsolete data mask.
#> ✖ Too late to resolve `cyl` after the end of `dplyr::mutate()`.
#> ℹ Did you save an object that uses `cyl` lazily in a column in the `dplyr::mutate()` expression ?

lionel- avatar Jul 05 '21 07:07 lionel-