dplyr icon indicating copy to clipboard operation
dplyr copied to clipboard

Consider allowing `.env$` in `join_by()`

Open DavisVaughan opened this issue 2 years ago • 1 comments

Motivated by https://github.com/tidyverse/dplyr/issues/6821

library(dplyr, warn.conflicts = FALSE)

df1 <- tibble(
  X1 = c("chr1", "chr2"),
  a = c(1, 2)
)

df2 <- tibble(
  X2 = c("chr2", "chr3"),
  b = c(3, 4)
)

by.1 <- "X1"
by.2 <- "X2"

# The "old" way
# inner_join(df1, df2, by = setNames(by.2, by.1))

# This works but we don't typically encourage using `!!` in high level APIs
inner_join(df1, df2, by = join_by(!!by.1 == !!by.2))
#> # A tibble: 1 × 3
#>   X1        a     b
#>   <chr> <dbl> <dbl>
#> 1 chr2      2     3
 
# It might be nice to be able to do this, but it currently errors
inner_join(df1, df2, by = join_by(.env$by.1 == .env$by.2))
#> Error in `join_by()`:
#> ! The left-hand side of a `$` expression must be either `x$` or `y$`.
#> ℹ Expression 1 contains `.env$by.1`.

Some notes from talking with @lionel-:

  • We have an equivalent-ish to .data$ already with x$ and y$. They don't have a . prefix because I attempted to match the original argument names
  • So it seems like we are missing an equivalent to the .env$ pronoun, and we may as well call it .env$ here too
  • It would look up in the caller_env() of join_by(), and I don't think join_by() would get an env = caller_env() argument to "pass the env through" unlike what is done in glue().
  • It would look up using inherits = TRUE, because it seems like tidy eval does this too, and we generally think that makes sense
  • It would require the result to be a single string, or die trying. i.e. you can only use this to inject a single name from the environment, not to inject an entire expression

DavisVaughan avatar Apr 13 '23 12:04 DavisVaughan

Or, since we don't really promote .data$ and .env$ that much, maybe we can take inspiration from any_of() / all_of() and have a custom helper that join_by() knows about that can look up variables in the surrounding env

The idea is something like this, likely with a different name:

join_by(all_of(by.1) == all_of(by.2))

It seems like tidyselect itself needs a helper that aligns with this idea of selecting exactly one thing from the environment, so if it adds that then we should use the same name here.

DavisVaughan avatar Apr 13 '23 12:04 DavisVaughan