via furrr.

Closes #277.

Closes #617 (=supersedes).

Things to consider:

check how suggested and imported CRAN packages use styler and what it means for parallelisation.
check impact on speed with pre-commit.
check impact with /style command in r-lib/actions.

Oct 20 '20 07:10 krlmlr

Thanks @krlmlr. Couple of points (some also discussed in #277):

I am not sure we should use {furrr}. Is it because of progress bars? We could just use future.apply or similar to not expand the dependency graph (I haven't checked if it actually does).
I think for this to work with future.apply() in #617, we needed to pass absolute file paths to the styler because the current working directory set with withr::with_dir(). There must be workarounds as in #617 or as discussed in https://github.com/r-lib/styler/issues/277#issuecomment-595879622. Seems like {furrr} handles that out of the box?
The authors of the future package do not recommend changing the plan in package code (see help(future::plan())). I am not sure we really want to modify the plan. Why can't we leave to the user and they can set an appropriate plan, e.g. in their .Rprofile? Also, what numbers of cores should/are used with your suggestion. For styling one file or for testing, it's desirable to not turn multicore support on.

Oct 20 '20 12:10 lorenzwalthert

Codecov Report

Merging #682 into master will decrease coverage by 0.27%. The diff coverage is 30.00%.

@@            Coverage Diff             @@
##           master     #682      +/-   ##
==========================================
- Coverage   90.43%   90.15%   -0.28%     
==========================================
  Files          47       47              
  Lines        2226     2234       +8     
==========================================
+ Hits         2013     2014       +1     
- Misses        213      220       +7

Impacted Files	Coverage Δ
R/transform-files.R	`93.91% <30.00%> (-6.09%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c50338f...e8615e8. Read the comment docs.

Oct 21 '20 08:10 codecov-io

furrr::future_map_lgl() is a drop-in replacement for map_lgl(), gives a progress bar, and has seen an update very recently. It's a suggested package, the default is to revert to serial styling. (We might want to show a one-time message, though, and/or offer a parallel = NA argument.
It just works with furrr.
From ?plan:

If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. This can be done using: ...

and this is what we do here. I'd say this improves user experience -- users who style their code might not want to even learn about future and whatnot.

Oct 21 '20 08:10 krlmlr

I see.

With regard to the future strategy, I'd like to 1) check if a plan has been set explicitly, if yes, follow this (I don't know if this is possible?). If not, set temporarily to future::plan("multiprocess") and restore on exit. Because with the current implementation and furrr installed, the user has no option to decide on the strategy, which I think is against the spirit of future (apart from setting the testthat env variable, but that shouldn't be the way to avoid a certain future strategy).
Also, setting the future strategy to plan('multiprocess') on my macOS almost takes 2s, so we should really only do it if there are two or more files to style.
Regarding progress bar, I note that the .progress argument will be deprecated in the future in favour of an approach compatible with the progressr package, so not sure we should introduce this here. It is a very visible user-facing change. As I understand from https://github.com/HenrikBengtsson/progressr/issues/78, progressr is only at the beginning and at the moment, you can either force progress display as a developer (which is against the principles of progressr) or ask the user to wrap every call to styler into with_progress(), which 99% of the users won't do. Other global options are potentially considered later.
Another problem is that we don't get real time updates on the styled files as with sequential processing. Seems like a future.apply approach with progressr also does not allow this, so we might need to drop it.

slow_sqrt <- function(xs) {
  p <- progressor(along = xs)
  future.apply::future_lapply(xs, function(x) {
    message("Calculating the square root of ", x)
    Sys.sleep(2)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
}

library(progressr)
handlers("progress")
with_progress(y <- slow_sqrt(1:10))

No matter how we decide in the end, we should add clear documentation and unit test various cases.

That said, I understand your point that 99% of the users probably don't want to bother with future strategies and progress bar settings but I think it's important to keep things transparent if possible.

Oct 21 '20 20:10 lorenzwalthert

OK, I will try to finish this up.

Nov 29 '20 18:11 lorenzwalthert

Thanks! Happy to review.

The startup costs are a problem. On Linux we could start one child process with callr, which then could spawn subprocesses very efficiently with the multicore plan. We could do the same on Windows and multiprocess: In both cases this would also eliminate the problem of temporarily altering the local plan. What do you think?

Nov 29 '20 19:11 krlmlr

But going through another sub-process will probably not decrease the start-up costs, would it? Because on top of future::plan(), a new R session has to be started, so costs add up? What would be the advantages of using a sub-process? To avoid the RStudio incompatibility of forked processes on UNIX? I don't think much can go wrong with capturing the plan and setting it on exit.

Nov 29 '20 20:11 lorenzwalthert

I thought about it again. There is an additional problem: When using future, setting a plan is time consuming, even when it's sequential. Hence, when there are less than 3 files to style and we don't set multisession, we should rather use map_lgl() instead of future_map_lgl(). Also, the user should have the option to fully control the process, as discussed in https://github.com/HenrikBengtsson/future/issues/450. For that reason< I think we should use this heuristic with the env variable R_STYLER_USE_FUTURE: (to be added to the docs):

I think we should make a short vignette for this.

Dec 05 '20 20:12 lorenzwalthert

Parallel styling

Codecov Report