dplyr icon indicating copy to clipboard operation
dplyr copied to clipboard

`rowwise()` with a mutate() call with warnings is very slow

Open charliejhadley opened this issue 4 years ago • 4 comments

As identified by @debruine here https://github.com/tidyverse/dplyr/issues/6236#issuecomment-1103972470 this issue is actually caused when mutate() is used after rowwise() and generates warnings.

This code takes ~11 seconds to run in a clean R session

library(dplyr)
library(tictoc)
tic()
foo <- tibble(
  id = 1:800
) %>% 
  rowwise() %>% 
  mutate(group_b_mean = mean(NULL))
toc()
# 11.584 sec elapsed

Please note that weirdly the same code takes <1second when run with the {reprex} addin.

Many thanks to debruine for figuring this out.

charliejhadley avatar Apr 20 '22 11:04 charliejhadley

It has to do with the warnings. You get the same problem without any use of c_across:

foo <- tibble(
  id = 1:800
) %>% 
  rowwise() %>% 
  mutate(group_b_mean = mean(NULL))

debruine avatar Apr 20 '22 14:04 debruine

I'm having this issue; it's fairly debilitating some code that use to run without issues. Is there any status update on this?

sjkiss avatar Jul 15 '22 14:07 sjkiss

Looks like it got slower after this pr. Example above goes from 1.6s to 15s for me and profvis shows all the time is spent in cli_format()

Goes to 0.05s if you change mean(NULL) to suppressWarnings(mean(NULL)) (obviously not a good long-term solution)

eutwt avatar Jul 15 '22 18:07 eutwt

Minimal reprex:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(id = 1:100) 

f <- function() {
  warning()
  1
}

bench::system_time(
  df %>% 
    rowwise() %>% 
    mutate(x = f())
)
#> process    real 
#>     3s    3.6s 

(Note that you can't run this with the reprex package because something about the way rmarkdown handles warnings makes the problem go away)

@lionel- could you take a look to see if there's any obvious way to speed up the warning wrapping in mutate_cols()?

hadley avatar Jul 21 '22 14:07 hadley

One possible way to go about this would be to collect warnings until all computations are done, and then emit the first few warnings and suggest to use something like dplyr_last_warnings() to get them all.

Edit: Would also help with #6005.

lionel- avatar Aug 22 '22 13:08 lionel-

@lionel- I like that idea.

hadley avatar Aug 22 '22 13:08 hadley

Wooo, much faster now!

library(dplyr, warn.conflicts = FALSE)

df <- tibble(id = 1:100) 

f <- function() {
  warning()
  1
}

bench::system_time(
  df %>% 
    rowwise() %>% 
    mutate(x = f())
)
#> Warning: There were 100 warnings in a `mutate()` step.
#> The first warning was:
#> ! Problem in row 1 while computing `x = f()`.
#> Caused by warning in `f()`:
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 99 remaining warnings.
#> process    real 
#>  66.8ms  65.8ms

Created on 2022-09-21 with reprex v2.0.2

hadley avatar Sep 21 '22 12:09 hadley

Sorry: has this been added to 1.0.10 yet? I'm not clear. How do I get this fix into my version of dplyr?

sjkiss avatar Nov 25 '22 15:11 sjkiss

@sjkiss it's in the dev version

hadley avatar Nov 28 '22 13:11 hadley

dplyr 1.1.2 is still slow on it

vak avatar Jan 29 '24 08:01 vak

@vak please file a new issue with reprex.

hadley avatar Jan 29 '24 13:01 hadley