purrr
purrr copied to clipboard
Progress bars
Would be nice to have support for progress bars in all map functions. This is a nice feature of plyr.
Could use https://github.com/gaborcsardi/progress, although we might need to ask @gaborcsardi to also provide a C API.
There is a header only C++ API, isn't that good? Although I have to say that is not very well tested and have less features. https://github.com/gaborcsardi/progress#c-api
It would also make sense to put the C++ part in another package, as it is completely independent.
This will not work with mapping functions because we eval R functions from the C code :/ Any user interruption or R error will cause a long jump that bypasses all C++ destructors.
Thus if you have any data on the heap, you'll get leaks. For example all STL containers or even a simple std::string
allocate memory dynamically so need to be destructed appropriately. See discussion in https://github.com/hadley/purrr/commit/e2def88a4039b15e2c8f92247808264fd03bcf4a
Well, there is an R API and a C++ API. It seems reasonable that you would be able to use at least one of them. :)
+1 for this feature
FWIW I wanted to note that I am adding some new progress bar API, which has the nice feature of having (almost) zero overhead when the progress bars are not shown (e.g. non-interactive use), in addition to ease of use. This is how it will look:
progress %~~% lapply(seq, fun, ...)
If progress bars are turned off, then it simply runs the lapply
. If progress bars are on, then it appends the progress bar ticks to fun
.
I am saying this, because it would be great to use it for purrr
functions as well.
It'd probably be more purrr-like to have an adverb functional or function operator that takes mapping functionals and add progress bars to them. With a functional:
# ..f must be another functional that takes a .x and a .f
# ..f must have the usual purrr signature ..f(.x, .f, ...)
with_progress <- function(..x, ..f, .f, ...) {
.f <- add_progress(.f, length(..x))
..f(..x, .f, ...)
}
mtcars %>% with_progress(map, as.character)
A quick thought: I think it's natural to have adverb functionals when we're modifying another functional, like in the example above. But otoh it's natural to have adverb function operators when we're modifying a regular function, e.g. safely()
, lift()
, etc.
@lionel- Hmmm, maybe I misunderstand sg, but why not
mtcars %>% with_progress(map)(as.character)
then? Or is this what you mean in your second comment?
Or is this what you mean in your second comment?
yes this is what I mean. Maybe @hadley has another opinion though.
Hmmm, actually I quite like this, no extra operator needed. Maybe I should do it with lapply
as well. Unfortunately I cannot really do it with for
loops. They have to use an operator:
with_pb %~~% for (i in 1:100) { }
Maybe I should do it with lapply as well
lapply()
, vapply()
etc should work for free with this approach since they take a vector as first argument and a function as second :)
mtcars %>% with_progress(vapply, sum, numeric(1))
Unfortunately I cannot really do it with for loops. They have to use an operator:
There is some discussion about function-like looping in #168 and #135.
I think that progress bars are so useful, there should be minimal friction to use them in purrr. That makes me think that they should be an option (like plyr), or possibly even automatically display given some conditional (e.g. loop has run for 2 seconds and has at least two more to go, like dplyr).
How about automatically displaying them unless (a) this is not an interactive session (b) a global option is set to disable them?
This is a case where it makes sense to have a global option since this is a side effect for user convenience that shouldn't have an impact on the return value. Also it'll still be possible to use withr::with_options()
in case it's important to control the option on a case by case basis (though I don't see when that would be useful). I think that's preferable to a plyr-like option that would clutter the function signatures.
Also it's still nice to have a functional to add progress bars to lapply()
etc.
Yes, agreed about non-interactive use + global option to turn off. That's what dplyr has too.
Will have to look if pbapply could help here. It uses global option and turned off when non-interactive.
Also useful to display names if they're present.
Any news on this issue? Has any sort of progress bar been implemented?
Yes I'm with @sillasgonzaga here on looking for an update. In some situations (particularly if there's an API call in the function) I'm dropping back to pbapply.
Any progress on the progress bar?
@sillasgonzaga, @chris-billingham, @tiernanmartin , I am not part of the tidyverse team but I happen to know that they work on each development work by phase. There will be a purrr
phase, don't worry !
So I think it does not help to ask for status update every 2 days or every week.
As you seem to be pretty interested in progress bar, if you don't already, know that currently, even if it is not transparent in purrr
, you can create progress bar in the tidyverse
.
Here is a dummy example you can run in your session, and it will display a progress bar.
# you can also load all the tidyverse
library(dplyr)
library(purrr)
# dummy list of 10 elements with random numbers
dummy_list <- rerun(10, runif(5))
# create the progress bar with a dplyr function.
pb <- progress_estimated(length(dummy_list))
res <- dummy_list %>%
map(~{
# update the progress bar (tick()) and print progress (print())
pb$tick()$print()
Sys.sleep(0.5)
sum(.x)
})
As you see it is just two lines to add to your code. Pretty simple.
One to create the progress bar element with dplyr::progress_estimated
. It will create an object pb
here that is an R6 class element. You can find the different methods with pb$<method>
. For updating progress bar and print progress, you can just use pb$tick()$print()
as you see in the example. You should read the help: help("progress_estimated", package = "dplyr")
It works very well with purrr
function. Only drawback : makes your piped code a little less concise.
Hope it helps, and it will keep you waiting until better integration in purrr
I think we have 3 options to integrate progress bars functionality in purrr
- create an adverb to modify the user function, adding
tick
ers on it. - add a
.progress=
parameter inside the map functions. - create an adverb to modify the map functions.
(1) is easier to code but will force the user to learn a new adverb that depends on the original function and the input (at least the input length). (2) is harder but is straightforward to the user. (3) is the most general but also the hardest to understand
To solve (1), I was thinking something like this adverb using @gaborcsardi progress
package
progressively <- function(.f, .n, ...) {
pb <- progress::progress_bar$new(total = .n, ...)
function(...) {
pb$tick()
.f(...)
}
}
Simple example:
input <- 1:5
fun <- function(x) {
Sys.sleep(.2)
sample(x)
}
progress_fun <- progressively(fun, length(input))
purrr::map(input, progress_fun)
The problem is that if we run this two times the progress bar is not shown, because pb
is already complete. But I think it is easy to find a way to restart it when this happens using some environment tricks.
If (1) is not enough, I think that (2) - add .progress=
option - is the best option, because (3) - modify map functions - is hard to understand. But I also think it will be difficult to code.
There's an fourth option as suggested by @lionel- and @hadley
- Add progress bar as default if the loop takes more than
s
seconds and the length of the input is greater thann
. Control this in the global options.
That's better than (2) so it's the best approach. Would it require big changes in map
functions?
This needs to be tackled at the same time as parallelism support, which we'll start working on soon.
@jtrecenti my vote is toward option 2, .progress = T
. Also progressively
is just too many characters IMO.
Can't wait! :)
We've been using furrr
package for a while now. It uses the future
package to do the hard job. @ctlente created a function named abjutils::pvec()
inside abjutils
package, that maps a function on a vector safely, in parallel and using progress bars. It has many bugs yet but I found it really really useful.
just wanted to chime in to say that I really dig @gaborcsardi progress package; much prefer the greater customisability over the simpler dplyr::progress_estimated()
(which already works with purrr as per above example).
So if purrr could support progress, that'd be great.
I probably should have checked here first, but i have produced wrapped versions of the purrr iterators which produce progress bars using the progress
package. You can find my very early version here. purrrgress with the caveat that nothing has been tested yet outside my own use cases.
call for this feature too