rmarkdown
rmarkdown copied to clipboard
rmarkdown::render crashes in parallel with parallel::makeForkCluster on macOS and Apple Silicon
Problem description
I am knitting multiple HTML documents from RMD files in parallel. Here is a small sample reproducing the bug:
library(foreach)
library(doParallel)
library(rmarkdown)
cluster <- parallel::makeForkCluster(bigstatsr::nb_cores())
# cluster <- parallel::makePSOCKcluster(bigstatsr::nb_cores())
doParallel::registerDoParallel(cluster)
foreach(temp_counter = seq(1, 10)) %dopar% {
html_file <- (paste0(temp_counter, ".html"))
Rmd_file <- (paste0(temp_counter, ".Rmd"))
text_string <- paste0("---\n", "title: 'TEST'\n", "---\n", "```{r setup, echo=FALSE}\n","print(temp_counter)\n","```")
write(text_string, file = Rmd_file, append = FALSE)
rmarkdown::render(Rmd_file, output_format = "html_document", output_file = html_file)
}
parallel::stopCluster(cluster)
I receive error messages like this:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
objc[19006]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
- parallel::makePSOCKcluster works fine.
- It is only rmarkdown::render that does not work with parallel::makeForkCluster. The .Rmd files are generated properly with parallel::makeForkCluster.
System information
The bug appears on multiple versions of RStudio / rmarkdown / pandoc, including this one:
R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.5, RStudio 2024.4.2.764
Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
Package version:
base64enc_0.1.3 bslib_0.6.0 cachem_1.0.8 cli_3.6.2 digest_0.6.33 ellipsis_0.3.2 evaluate_0.23
fastmap_1.1.1 fontawesome_0.5.2 fs_1.6.3 glue_1.7.0 graphics_4.3.2 grDevices_4.3.2 highr_0.10
htmltools_0.5.7 jquerylib_0.1.4 jsonlite_1.8.7 knitr_1.43 lifecycle_1.0.4 magrittr_2.0.3 memoise_2.0.1
methods_4.3.2 mime_0.12 R6_2.5.1 rappdirs_0.3.3 rlang_1.1.4 rmarkdown_2.22 sass_0.4.7
stats_4.3.2 stringi_1.8.2 stringr_1.5.1 tinytex_0.49 tools_4.3.2 utils_4.3.2 vctrs_0.6.5
xfun_0.41 yaml_2.3.7
Pandoc version: 3.1.11
Checklist
When filing a bug report, please check the boxes below to confirm that you have provided us with the information we need. Have you:
-
[x] formatted your issue so it is easier for us to read?
-
[x] included a minimal, self-contained, and reproducible example?
-
[x] pasted the output from
xfun::session_info('rmarkdown')in your issue? -
[x] upgraded all your packages to their latest versions (including your versions of R, the RStudio IDE, and relevant R packages)?
-
[ ] installed and tested your bug with the development version of the rmarkdown package using
remotes::install_github("rstudio/rmarkdown")?
I don't know enough on using parallel with Fork logic so that will be hard to investigate. I know that rmarkdown::render on parallel logic is not ideal if you don't copy files and use external process to render. render() logic needs to access and creates intermediates files, and naming does not take into account a random name, so there could be conflict.
Related parallel issue
- https://github.com/rstudio/rmarkdown/issues/1632
- https://github.com/rstudio/rmarkdown/issues/2454
- https://github.com/rstudio/rmarkdown/issues/499
- https://github.com/rstudio/rmarkdown/issues/1268
So this could be a new occurence of using rmarkdown in parallel logic. I would say this is a limitation.
If you can get to the bottom of what is happening we could think of a fix. Any help appreciated on this one.
Thanks for the reply. I did have a look at these earlier issues that you mentioned and tried some of the proposed fixes, to no avail. Two additional notes:
- The code I submitted has a different .Rmd file per thread. I also tried to put these files in a different folder for each thread, but this did not fix the issue.
- The issue seems specific to macOS; we could not reproduce it on linux.
- One last note: the bug is also present if I run the script from a terminal in R, outside of RStudio.
The issue seems specific to macOS; we could not reproduce it on linux.
This is really interesting. @yihui do you know anything as a Mac user on this type of run ?
One more comment after further tests: the bug remains if I run only one instance in parallel:
foreach(temp_counter = seq(1, 1)) %dopar% { ... }
The error message makes little sense in this context.
This problem is very deep... At the bottom, it's caused by the default options(bitmapType = 'quartz') on macOS, and quartz is an on-screen device. Then the default device for R Markdown is png() when the output format is HTML, and png() uses options(bitmapType) as the default value for its type argument. As a result, you ended up calling an on-screen device in the forked processes, which led to the errors you saw.
From ?parallel::mcfork:
Child processes should never use on-screen graphics devices.
If you set options(bitmapType = 'cairo') before you call rmarkdown::render(), the previous errors should disappear. However, I get these errors:
pandoc: /var/folders/.../T//Rtmp.../rmarkdown-str....html: withBinaryFile: does not exist (No such file or directory)
This might have revealed a bug of rmarkdown when running render() in parallel.
Thanks for the reply. Setting options(bitmapType = 'cairo') fixed the last of the three bugs. The bug pandoc: /var/folders/.../T//Rtmp.../rmarkdown-str....html: withBinaryFile: does not exist (No such file or directory) that you mentioned was fixed here: https://github.com/rstudio/rmarkdown/issues/1632.