dplyr icon indicating copy to clipboard operation
dplyr copied to clipboard

Feature Request: In across, have function for `.names` argument

Open kylebutts opened this issue 1 year ago • 3 comments

One common thing that I want to do in the dplyr workflow is take a set of variables, apply some function to them, and create a new set of variables. across makes this really easy, but it almost always requires me updating the names in a second step with rename_with. I'm wondering if a function would be passed to .names that accepts the vector of column names and returns the resulting columns names.

For example, I would want:

data |> 
  mutate(across(
    matches("emp_[0-9]+"),
    ~ .x / emp_all, 
    .names = "share_{.col}"
  )) |>
  rename_with(
    \(str) gsub("^share_emp", "share_", str)
  ) 

to become:

data |> 
  mutate(across(
    matches("emp_[0-9]+"),
    ~ .x / emp_all, 
    .names = \(str) gsub("^share_emp", "share_", str)
  ))

Admittedly, I haven't thought of how that would play with other features of the function (e.g. multiple functions). Thanks for the consideration!

kylebutts avatar Nov 30 '23 18:11 kylebutts

Ahh, I think I figured it out. Since it's a call to glue(), you can pass a function within the {} like .names = "{gsub("emp_", "share_", .col)}. Would y'all recommend a PR with an example of this in the colwise vignette?

library(dplyr)
N = 100
data <- tibble(
  draw1 = rnorm(N),
  draw2 = rnorm(N),
  draw3 = rnorm(N)
)

data |> 
  mutate(sum = draw1 + draw2 + draw3) |>
  mutate(across(
    draw1:draw3, 
    function(x) x / sum, 
    # .names passas string to `glue()` function
    # so you can do fancy things:
    .names = "share_{gsub('draw', '', .col)}"
  ))
#> # A tibble: 100 × 7
#>      draw1  draw2   draw3    sum share_1 share_2 share_3
#>      <dbl>  <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
#>  1 -0.779  -1.38   0.293  -1.87   0.418   0.739  -0.157 
#>  2 -0.184  -0.538 -0.331  -1.05   0.175   0.511   0.315 
#>  3 -1.07   -1.08  -0.160  -2.31   0.463   0.468   0.0693
#>  4 -0.309   1.03   0.0462  0.767 -0.403   1.34    0.0602
#>  5  0.359  -0.484  0.857   0.732  0.491  -0.662   1.17  
#>  6  0.330   0.491  0.871   1.69   0.195   0.290   0.515 
#>  7 -0.0563 -0.579 -1.16   -1.80   0.0313  0.322   0.647 
#>  8  0.608   1.33   0.796   2.74   0.222   0.487   0.291 
#>  9 -1.95   -0.158 -0.752  -2.86   0.682   0.0554  0.263 
#> 10  0.267   0.373 -0.383   0.256  1.04    1.46   -1.50  
#> # ℹ 90 more rows

Created on 2023-11-30 with reprex v2.0.2

kylebutts avatar Nov 30 '23 22:11 kylebutts

You can also rename the output of across since it's a tibble.

moodymudskipper avatar Dec 21 '23 10:12 moodymudskipper

data |> 
  mutate(sum = draw1 + draw2 + draw3) |>
  mutate(across(
    draw1:draw3, 
    function(x) x / sum, 
    .names = "{stringr::str_replace(.col, 'draw', 'share')}"
  ))

perlatex avatar Jun 08 '24 12:06 perlatex