dplyr icon indicating copy to clipboard operation
dplyr copied to clipboard

Allow .by=row_number() in mutate statements

Open torfason opened this issue 2 years ago • 0 comments

I want to calculate a scale and add it to my data frame. Here are three options that do work:

d |>
  rowwise() |>
  mutate(my_scale_a = mean(c_across(starts_with("M:")), na.rm = TRUE)) |>
  ungroup()

Works, and used to be recommended, but now that group_by() is getting slowly superseded by the .by keyword, it seems we need a replacement for rowwise() as well.

My data happens to have an ID column, so I can get the same result (I checked) with:

d |>
  mutate(my_scale_b = mean(c_across(starts_with("M:")), na.rm = TRUE), .by=`Response ID`)

If I did not have an ID column, I could have created one with:

d |>
  mutate(rownum = row_number()) |>
  mutate(my_scale_c = mean(c_across(starts_with("M:")), na.rm = TRUE), .by=rownum)

But of course, I then want a fourth way to do this, which does not work:

> d.org |>
+   mutate(my_scale_d = mean(c_across(starts_with("M:")), na.rm = TRUE), .by=row_number())
Error in `mutate()`:
! Problem while evaluating `row_number()`.
Caused by error in `n()`:
! Must only be used inside data-masking verbs like `mutate()`, `filter()`, and `group_by()`.
Run `rlang::last_trace()` to see where the error occurred.

All of this is a long-winded way of saying that row_number() seems to have a very useful interpretation as an argument to the .by parameter within a mutate() function.

torfason avatar Apr 02 '24 17:04 torfason