Parallel styling
via furrr.
Closes #277.
Closes #617 (=supersedes).
Things to consider:
- check how suggested and imported CRAN packages use styler and what it means for parallelisation.
- check impact on speed with pre-commit.
- check impact with
/stylecommand in r-lib/actions.
Thanks @krlmlr. Couple of points (some also discussed in #277):
- I am not sure we should use
{furrr}. Is it because of progress bars? We could just use future.apply or similar to not expand the dependency graph (I haven't checked if it actually does). - I think for this to work with
future.apply()in #617, we needed to pass absolute file paths to the styler because the current working directory set withwithr::with_dir(). There must be workarounds as in #617 or as discussed in https://github.com/r-lib/styler/issues/277#issuecomment-595879622. Seems like{furrr}handles that out of the box? - The authors of the future package do not recommend changing the plan in package code (see
help(future::plan())). I am not sure we really want to modify the plan. Why can't we leave to the user and they can set an appropriate plan, e.g. in their.Rprofile? Also, what numbers of cores should/are used with your suggestion. For styling one file or for testing, it's desirable to not turn multicore support on.
Codecov Report
Merging #682 into master will decrease coverage by
0.27%. The diff coverage is30.00%.
@@ Coverage Diff @@
## master #682 +/- ##
==========================================
- Coverage 90.43% 90.15% -0.28%
==========================================
Files 47 47
Lines 2226 2234 +8
==========================================
+ Hits 2013 2014 +1
- Misses 213 220 +7
| Impacted Files | Coverage Δ | |
|---|---|---|
| R/transform-files.R | 93.91% <30.00%> (-6.09%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update c50338f...e8615e8. Read the comment docs.
-
furrr::future_map_lgl()is a drop-in replacement formap_lgl(), gives a progress bar, and has seen an update very recently. It's a suggested package, the default is to revert to serial styling. (We might want to show a one-time message, though, and/or offer aparallel = NAargument. -
It just works with furrr.
-
From
?plan:If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. This can be done using: ...
and this is what we do here. I'd say this improves user experience -- users who style their code might not want to even learn about future and whatnot.
I see.
- With regard to the future strategy, I'd like to 1) check if a plan has been set explicitly, if yes, follow this (I don't know if this is possible?). If not, set temporarily to
future::plan("multiprocess")and restore on exit. Because with the current implementation andfurrrinstalled, the user has no option to decide on the strategy, which I think is against the spirit offuture(apart from setting the testthat env variable, but that shouldn't be the way to avoid a certain future strategy). - Also, setting the future strategy to
plan('multiprocess')on my macOS almost takes 2s, so we should really only do it if there are two or more files to style. - Regarding progress bar, I note that the
.progressargument will be deprecated in the future in favour of an approach compatible with the progressr package, so not sure we should introduce this here. It is a very visible user-facing change. As I understand from https://github.com/HenrikBengtsson/progressr/issues/78, progressr is only at the beginning and at the moment, you can either force progress display as a developer (which is against the principles of progressr) or ask the user to wrap every call to styler intowith_progress(), which 99% of the users won't do. Other global options are potentially considered later. - Another problem is that we don't get real time updates on the styled files as with sequential processing. Seems like a
future.applyapproach withprogressralso does not allow this, so we might need to drop it.
slow_sqrt <- function(xs) {
p <- progressor(along = xs)
future.apply::future_lapply(xs, function(x) {
message("Calculating the square root of ", x)
Sys.sleep(2)
p(sprintf("x=%g", x))
sqrt(x)
})
}
library(progressr)
handlers("progress")
with_progress(y <- slow_sqrt(1:10))
- No matter how we decide in the end, we should add clear documentation and unit test various cases.
That said, I understand your point that 99% of the users probably don't want to bother with future strategies and progress bar settings but I think it's important to keep things transparent if possible.
OK, I will try to finish this up.
Thanks! Happy to review.
The startup costs are a problem. On Linux we could start one child process with callr, which then could spawn subprocesses very efficiently with the multicore plan. We could do the same on Windows and multiprocess: In both cases this would also eliminate the problem of temporarily altering the local plan. What do you think?
But going through another sub-process will probably not decrease the start-up costs, would it? Because on top of future::plan(), a new R session has to be started, so costs add up? What would be the advantages of using a sub-process? To avoid the RStudio incompatibility of forked processes on UNIX? I don't think much can go wrong with capturing the plan and setting it on exit.
I thought about it again. There is an additional problem: When using future, setting a plan is time consuming, even when it's sequential. Hence, when there are less than 3 files to style and we don't set multisession, we should rather use map_lgl() instead of future_map_lgl(). Also, the user should have the option to fully control the process, as discussed in https://github.com/HenrikBengtsson/future/issues/450. For that reason< I think we should use this heuristic with the env variable R_STYLER_USE_FUTURE: (to be added to the docs):

I think we should make a short vignette for this.