`rowwise()` with a mutate() call with warnings is very slow
As identified by @debruine here https://github.com/tidyverse/dplyr/issues/6236#issuecomment-1103972470 this issue is actually caused when mutate() is used after rowwise() and generates warnings.
This code takes ~11 seconds to run in a clean R session
library(dplyr)
library(tictoc)
tic()
foo <- tibble(
id = 1:800
) %>%
rowwise() %>%
mutate(group_b_mean = mean(NULL))
toc()
# 11.584 sec elapsed
Please note that weirdly the same code takes <1second when run with the {reprex} addin.
Many thanks to debruine for figuring this out.
It has to do with the warnings. You get the same problem without any use of c_across:
foo <- tibble(
id = 1:800
) %>%
rowwise() %>%
mutate(group_b_mean = mean(NULL))
I'm having this issue; it's fairly debilitating some code that use to run without issues. Is there any status update on this?
Looks like it got slower after this pr. Example above goes from 1.6s to 15s for me and profvis shows all the time is spent in cli_format()
Goes to 0.05s if you change mean(NULL) to suppressWarnings(mean(NULL)) (obviously not a good long-term solution)
Minimal reprex:
library(dplyr, warn.conflicts = FALSE)
df <- tibble(id = 1:100)
f <- function() {
warning()
1
}
bench::system_time(
df %>%
rowwise() %>%
mutate(x = f())
)
#> process real
#> 3s 3.6s
(Note that you can't run this with the reprex package because something about the way rmarkdown handles warnings makes the problem go away)
@lionel- could you take a look to see if there's any obvious way to speed up the warning wrapping in mutate_cols()?
One possible way to go about this would be to collect warnings until all computations are done, and then emit the first few warnings and suggest to use something like dplyr_last_warnings() to get them all.
Edit: Would also help with #6005.
@lionel- I like that idea.
Wooo, much faster now!
library(dplyr, warn.conflicts = FALSE)
df <- tibble(id = 1:100)
f <- function() {
warning()
1
}
bench::system_time(
df %>%
rowwise() %>%
mutate(x = f())
)
#> Warning: There were 100 warnings in a `mutate()` step.
#> The first warning was:
#> ! Problem in row 1 while computing `x = f()`.
#> Caused by warning in `f()`:
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 99 remaining warnings.
#> process real
#> 66.8ms 65.8ms
Created on 2022-09-21 with reprex v2.0.2
Sorry: has this been added to 1.0.10 yet? I'm not clear. How do I get this fix into my version of dplyr?
@sjkiss it's in the dev version
dplyr 1.1.2 is still slow on it
@vak please file a new issue with reprex.