blogdown icon indicating copy to clipboard operation
blogdown copied to clipboard

Enable optional parallel site building

Open jozefhajnala opened this issue 5 years ago • 5 comments

This PR proposes an option to build_site() using multiple parallel-running R processes, utilizing only the base package parallel. This can result in significant speed improvements with hardware common in 2019.

To enable the parallelization, the user must specify 2 options:

options(
  blogdown.use_parallel = TRUE,
  blogdown.use_parallel.cores = number_of_cores
)

The functionality is only triggered if:

  • the option blogdown.use_parallel is TRUE
  • the option blogdown.use_parallel.cores is > 1
  • the length of files is > 1
  • the parallel package is available

If this direction of functionality is accepted, we can make the implementation less conservative and easier to use.

jozefhajnala avatar May 15 '19 17:05 jozefhajnala

I would consider this ready to be reviewed, just not sure about the "UI" in terms of using the feature. I am a fan of conservative introduction of new features, but happy to change.

jozefhajnala avatar May 16 '19 17:05 jozefhajnala

@yihui, is there anything waited on from my side?

jozefhajnala avatar May 30 '19 07:05 jozefhajnala

Just some thoughts about letting user choose how to do parallel. 💭

  • The future package is really helpful for that. It allows a clean separation between what is parallelized and how it should be run. However, it would mean adding it as dependency, at least in suggest, and could be too heavy 🤔
  • A vignette that gives examples of how to use this option could be insteresting. It would allow to show build_rmds_parallel and explain how it build on parallel package. Either the user can copy paste from the vignette, or know that a helper blogdown::build_rmds_parallel exists.

Thanks for this feature by the way !

cderv avatar Jun 15 '19 10:06 cderv

Just some thoughts about letting user choose how to do parallel. 💭

  • The future package is really helpful for that. It allows a clean separation between what is parallelized and how it should be run. However, it would mean adding it as dependency, at least in suggest, and could be too heavy 🤔
  • A vignette that gives examples of how to use this option could be insteresting. It would allow to show build_rmds_parallel and explain how it build on parallel package. Either the user can copy paste from the vignette, or know that a helper blogdown::build_rmds_parallel exists.

Thanks for this feature by the way !

Thanks for the feedback!

  • The currently suggested PR would let you choose your own way of parallelization, so you could define options("blogdown.build_rmds" = my_parallelization_fun), where my_parallelization_fun() could use the future package to parallelize, without having to introduce a dependency to blogdown
  • I will happily spend some time writing a vignette if we get the PR merged ;-)

jozefhajnala avatar Jun 18 '19 18:06 jozefhajnala

Hi @yihui, do we still want to move this forward?

jozefhajnala avatar Sep 25 '19 18:09 jozefhajnala