dplyr
dplyr copied to clipboard
Feature Request: In across, have function for `.names` argument
One common thing that I want to do in the dplyr workflow is take a set of variables, apply some function to them, and create a new set of variables. across makes this really easy, but it almost always requires me updating the names in a second step with rename_with. I'm wondering if a function would be passed to .names that accepts the vector of column names and returns the resulting columns names.
For example, I would want:
data |>
mutate(across(
matches("emp_[0-9]+"),
~ .x / emp_all,
.names = "share_{.col}"
)) |>
rename_with(
\(str) gsub("^share_emp", "share_", str)
)
to become:
data |>
mutate(across(
matches("emp_[0-9]+"),
~ .x / emp_all,
.names = \(str) gsub("^share_emp", "share_", str)
))
Admittedly, I haven't thought of how that would play with other features of the function (e.g. multiple functions). Thanks for the consideration!
Ahh, I think I figured it out. Since it's a call to glue(), you can pass a function within the {} like .names = "{gsub("emp_", "share_", .col)}. Would y'all recommend a PR with an example of this in the colwise vignette?
library(dplyr)
N = 100
data <- tibble(
draw1 = rnorm(N),
draw2 = rnorm(N),
draw3 = rnorm(N)
)
data |>
mutate(sum = draw1 + draw2 + draw3) |>
mutate(across(
draw1:draw3,
function(x) x / sum,
# .names passas string to `glue()` function
# so you can do fancy things:
.names = "share_{gsub('draw', '', .col)}"
))
#> # A tibble: 100 × 7
#> draw1 draw2 draw3 sum share_1 share_2 share_3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.779 -1.38 0.293 -1.87 0.418 0.739 -0.157
#> 2 -0.184 -0.538 -0.331 -1.05 0.175 0.511 0.315
#> 3 -1.07 -1.08 -0.160 -2.31 0.463 0.468 0.0693
#> 4 -0.309 1.03 0.0462 0.767 -0.403 1.34 0.0602
#> 5 0.359 -0.484 0.857 0.732 0.491 -0.662 1.17
#> 6 0.330 0.491 0.871 1.69 0.195 0.290 0.515
#> 7 -0.0563 -0.579 -1.16 -1.80 0.0313 0.322 0.647
#> 8 0.608 1.33 0.796 2.74 0.222 0.487 0.291
#> 9 -1.95 -0.158 -0.752 -2.86 0.682 0.0554 0.263
#> 10 0.267 0.373 -0.383 0.256 1.04 1.46 -1.50
#> # ℹ 90 more rows
Created on 2023-11-30 with reprex v2.0.2
You can also rename the output of across since it's a tibble.
data |>
mutate(sum = draw1 + draw2 + draw3) |>
mutate(across(
draw1:draw3,
function(x) x / sum,
.names = "{stringr::str_replace(.col, 'draw', 'share')}"
))