rmarkdown icon indicating copy to clipboard operation
rmarkdown copied to clipboard

pandoc: openBinaryFile: does not exist (No such file or directory) when using render() in parallel

Open gorgitko opened this issue 4 years ago • 26 comments


By filing an issue to this repo, I promise that

  • [x] I have fully read the issue guide at https://yihui.name/issue/.
  • [x] I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('rmarkdown'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('rstudio/rmarkdown').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • [x] I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.


When I am using rmarkdown::render() in BiocParallel::bplapply(), Pandoc throws this error: pandoc: /tmp/RtmpW06rTD/rmarkdown-str3bc26dd971b5.html: openBinaryFile: does not exist (No such file or directory). I am using Pandoc version 2.7.3 and development version of rmarkdown. Everything works fine when I use BPPARAM = SerialParam() in bplapply (i.e. it will disable parallel processing).

to_render.Rmd:

---
output:
  html_document:
    code_folding: "none"
    df_print: "paged"
    toc: false
    number_sections: false
    theme: "united"
    self_contained: true
params: 
  title: "Document"
---

---
title: `r params$title`
---

```{r}
DT::datatable(
  mtcars,
  filter = "top",
  width = "100%",
  class = "display compact"
)
```

render.R:

library(glue)
library(rmarkdown)
library(BiocParallel)

N_CPUS <- 8

OUTPUT_DIR <- "rendered"
OUTPUT_FILES <- 1:10

BPPARAM <- MulticoreParam(workers = N_CPUS)

dir.create(OUTPUT_DIR, showWarnings = FALSE)

bplapply(OUTPUT_FILES, function(i) {
  intermediates_dir <- glue("{i}_intermediates_dir")

  render(
    "to_render.Rmd",
    output_file = glue("{i}.html"),
    output_dir = OUTPUT_DIR,
    params = list(title = glue("Document {i}")),
    intermediates_dir = intermediates_dir
  )

  system(glue("rm -r {intermediates_dir}"))
}, BPPARAM = BPPARAM)

Output from render.R:

> library(glue)

> library(rmarkdown)

> library(BiocParallel)

> N_CPUS <- 8

> OUTPUT_DIR <- "rendered"

> OUTPUT_FILES <- 1:10

> BPPARAM <- MulticoreParam(workers = N_CPUS)

> dir.create(OUTPUT_DIR, showWarnings = FALSE)

> bplapply(OUTPUT_FILES, function(i) {
+   intermediates_dir <- glue("{i}_intermediates_dir")
+ 
+   render(
+     "to_render.Rmd",
+     output_file  .... [TRUNCATED] 


processing file: to_render.Rmd
  |................................                                 |  50%
  ordinary text without R code

  |.................................................................| 100%
label: unnamed-chunk-1

output file: /ssd/ownCloud/temp/rmarkdown_parallel_reprex/3_intermediates_dir/to_render.knit.md

/usr/bin/pandoc +RTS -K512m -RTS /ssd/ownCloud/temp/rmarkdown_parallel_reprex/3_intermediates_dir/to_render.utf8.md --to html4 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output /ssd/ownCloud/temp/rmarkdown_parallel_reprex/rendered/3.html --email-obfuscation none --self-contained --standalone --section-divs --template /home/novotnyj/R/x86_64-pc-linux-gnu-library/3.6/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable 'theme:united' --include-in-header /tmp/Rtmpiblzto/rmarkdown-str42fb6ef4dcc8.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --metadata pagetitle=/ssd/ownCloud/temp/rmarkdown_parallel_reprex/3_intermediates_dir/to_render.utf8.md 

Output created: rendered/3.html
pandoc: /tmp/Rtmpiblzto/rmarkdown-str42fc5c014392.html: openBinaryFile: does not exist (No such file or directory)


processing file: to_render.Rmd
  |................................                                 |  50%
  ordinary text without R code

  |.................................................................| 100%
label: unnamed-chunk-1

output file: /ssd/ownCloud/temp/rmarkdown_parallel_reprex/4_intermediates_dir/to_render.knit.md

/usr/bin/pandoc +RTS -K512m -RTS /ssd/ownCloud/temp/rmarkdown_parallel_reprex/4_intermediates_dir/to_render.utf8.md --to html4 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output /ssd/ownCloud/temp/rmarkdown_parallel_reprex/rendered/4.html --email-obfuscation none --self-contained --standalone --section-divs --template /home/novotnyj/R/x86_64-pc-linux-gnu-library/3.6/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable 'theme:united' --include-in-header /tmp/Rtmpiblzto/rmarkdown-str42fc698c38a7.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --metadata pagetitle=/ssd/ownCloud/temp/rmarkdown_parallel_reprex/4_intermediates_dir/to_render.utf8.md 

Output created: rendered/4.html
pandoc: /tmp/Rtmpiblzto/rmarkdown-str42fa67445478.html: openBinaryFile: does not exist (No such file or directory)
pandoc: /tmp/Rtmpiblzto/rmarkdown-str42ff5be0d7db.html: openBinaryFile: does not exist (No such file or directory)
pandoc: /tmp/Rtmpiblzto/rmarkdown-str42fd4840f002.html: openBinaryFile: does not exist (No such file or directory)
Error: BiocParallel errors
  element index: 2, 5, 7, 10
  first error: pandoc document conversion failed with error 1

Pandoc and session info:

> pandoc_exec()
[1] "/usr/bin/pandoc"
> pandoc_version()
[1] ‘2.7.3’
> xfun::session_info()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19, RStudio 1.2.1335

Locale:
  LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=cs_CZ.UTF-8    LC_MESSAGES=en_US.UTF-8   
  LC_PAPER=cs_CZ.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C       

Package version:
  base64enc_0.1.3      BH_1.69.0.1          BiocParallel_1.18.1  compiler_3.6.1       digest_0.6.20        evaluate_0.14        formatR_1.7          futile.logger_1.4.3 
  futile.options_1.0.1 glue_1.3.1           graphics_3.6.1       grDevices_3.6.1      highr_0.8            htmltools_0.3.6      jsonlite_1.6         knitr_1.24          
  lambda.r_1.2.3       magrittr_1.5         markdown_1.1         methods_3.6.1        mime_0.7             parallel_3.6.1       Rcpp_1.0.2           rmarkdown_1.15.1    
  snow_0.4.3           stats_3.6.1          stringi_1.4.3        stringr_1.4.0        tinytex_0.15         tools_3.6.1          utils_3.6.1          xfun_0.9            
  yaml_2.2.0    

gorgitko avatar Aug 28 '19 10:08 gorgitko

I believe this may be the same issue as https://github.com/rstudio/rmarkdown/issues/1268 and https://github.com/rstudio/rmarkdown/issues/701. As well as

https://stackoverflow.com/questions/48161177/r-markdown-openbinaryfile-does-not-exist-no-such-file-or-directory

and

https://community.rstudio.com/t/fail-to-generate-file-in-rmarkdwon-openbinaryfile-does-not-exist-no-such-file-or-directory/34913

In my opinion, the issue has still not been completely resolved. I'm running into this error myself in a much more simple situation--no network drives or other unusual configurations.

abalter avatar Oct 23 '19 20:10 abalter

In my opinion, the issue has still not been completely resolved. I'm running into this error myself in a much more simple situation--no network drives or other unusual configurations.

I can confirm this is happening on local computer using local drives, not network ones.

gorgitko avatar Oct 24 '19 07:10 gorgitko

So I have examined a little bit how is actually rmarkdown using pandoc and especially what is the temporary file pandoc cannot find:

pandoc: /tmp/RtmpGXCEhW/rmarkdown-str7381eaa5d53.html: openBinaryFile: does not exist (No such file or directory)

I have found that this file is probably somehow used for navbars, but more important is this function:

# temp files created by as_tmpfile() cannot be immediately removed because they
# are needed later by the pandoc conversion; we have to clean up the temp files
# that have the pattern specified in `tmpfile_pattern` when render() exits
clean_tmpfiles <- function() {
  unlink(list.files(
    tempdir(), sprintf("^%s[0-9a-f]+[.]html$", tmpfile_pattern), full.names = TRUE
  ))
}

called in render():

  # render() may call itself, e.g., in discover_rmd_resources(); in this case,
  # we should not clean up temp files in the nested render() call, but wait
  # until the top-level render() exits to clean up temp files
  .globals$level <- .globals$level + 1L  # increment level in a nested render()
  on.exit({
    .globals$level <- .globals$level - 1L
    if (.globals$level == 0) clean_tmpfiles()
  }, add = TRUE)

So what is actually happening? After render() call is finished, this clean_tmpfiles function removes all rmarkdown temporary files. And because in parallel calling the temporary directory is remaining the same, it will also remove temporary files for other render() calls.

I can confirm this dirty solution works (put this after library(rmarkdown)):

clean_tmpfiles_mod <- function() {
  message("Calling clean_tmpfiles_mod()")
}

assignInNamespace("clean_tmpfiles", clean_tmpfiles_mod, ns = "rmarkdown")

Would be great if developers add something like clean_tmpfiles = TRUE to render() parameters and users could then call clean_tmpfiles() by themselves.

Full modified render.R:

library(glue)
library(rmarkdown)
library(BiocParallel)

clean_tmpfiles_mod <- function() {
  message("Calling clean_tmpfiles_mod()")
}

assignInNamespace("clean_tmpfiles", clean_tmpfiles_mod, ns = "rmarkdown")

N_CPUS <- 8

OUTPUT_DIR <- "rendered"
OUTPUT_FILES <- 1:10

BPPARAM <- MulticoreParam(workers = N_CPUS)

dir.create(OUTPUT_DIR, showWarnings = FALSE)

bplapply(OUTPUT_FILES, function(i) {
  intermediates_dir <- glue("{i}_intermediates_dir")

  render(
    "to_render.Rmd",
    output_file = glue("{i}.html"),
    output_dir = OUTPUT_DIR,
    params = list(title = glue("Document {i}")),
    intermediates_dir = intermediates_dir
  )

  system(glue("rm -r {intermediates_dir}"))
}, BPPARAM = BPPARAM)

gorgitko avatar Oct 24 '19 09:10 gorgitko

We're using a basic Pandoc script in bash to merge multiple files:

pandoc lf/lf_01.txt master.md lf/lf_02.txt master_fr.md lf/lf_03.txt master_es.md lf/lf_04.txt master_pt.md lf/lf_05.txt master_de.md lf/lf_06.txt master_it.md lf/lf_07.txt master_ja.md lf/lf_08.txt master.md lf/lf_09.txt > output.html

Is it possible to add a function to prevent clearing the temp files which causes the "openBinaryFile: does not exist (No such file or directory)" error in this medium?

JayMMTL avatar Apr 29 '20 16:04 JayMMTL

@JayMMTL I am not sure how is this connected to rendering Rmds from within R. If this is the case, you can use the snippet I have provided to replace the clean_tmpfiles() function, which causes this problem.

gorgitko avatar Apr 30 '20 05:04 gorgitko

@gorgitko Your solution worked perfectly for me thanks. Working on a linux vm building rmarkdown files in parallel. Spent whole afternoon trying to figure this out and in the end your black magic dirty solution did the trick. Thanks!

Nicktz avatar May 18 '20 11:05 Nicktz

Thanks @gorgitko for explaining the root cause. Unfortunately, your solution does not work on my Macbook Pro 16. It does call your mod function but the error 1 is thrown before:

<div id=: openBinaryFile: does not exist (No such file or directory)
Error: pandoc document conversion failed with error 1
Calling clean_tmpfiles_mod()

I also tried the set off a flag mentioned in several posts: create_report(iris, config = configure_report(add_plot_str = FALSE))

SessionInfo:

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] knitr_1.28.7             data.table_1.12.8        DataExplorer_0.8.1       rmarkdown_2.1.4         
 [5] likert_1.3.5             xtable_1.8-4             machinelearningtools_0.1 forcats_0.5.0           
 [9] stringr_1.4.0            dplyr_0.8.99.9003        purrr_0.3.4              readr_1.3.1             
[13] tidyr_1.0.2              tibble_3.0.1             ggplot2_3.3.0            tidyverse_1.3.0.9000    
[17] magrittr_1.5             googlesheets_0.3.0       psych_1.9.12.31         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6      lubridate_1.7.8   lattice_0.20-38   assertthat_0.2.1  packrat_0.5.0    
 [6] digest_0.6.25     utf8_1.1.4        R6_2.4.1          cellranger_1.1.0  plyr_1.8.6       
[11] backports_1.1.7   reprex_0.3.0      evaluate_0.14     highr_0.8         httr_1.4.1       
[16] pillar_1.4.4      rlang_0.4.6.9000  curl_4.3          readxl_1.3.1      rstudioapi_0.11  
[21] labeling_0.3      htmlwidgets_1.5.1 igraph_1.2.5      munsell_0.5.0     broom_0.5.6      
[26] compiler_3.6.2    modelr_0.1.6      xfun_0.13         pkgconfig_2.0.3   mnormt_1.5-5     
[31] htmltools_0.4.0   tidyselect_1.1.0  gridExtra_2.3     fansi_0.4.1       crayon_1.3.4     
[36] dbplyr_1.4.3      withr_2.2.0       grid_3.6.2        nlme_3.1-142      jsonlite_1.6.1   
[41] gtable_0.3.0      lifecycle_0.2.0   DBI_1.1.0         scales_1.1.1      cli_2.0.2        
[46] stringi_1.4.6     farver_2.0.3      reshape2_1.4.4    fs_1.4.1          xml2_1.3.1       
[51] ellipsis_0.3.1    generics_0.0.2    vctrs_0.3.0.9000  tools_3.6.2       glue_1.4.1       
[56] networkD3_0.4     hms_0.5.3         yaml_2.2.1        parallel_3.6.2    colorspace_1.4-1 
[61] rvest_0.3.5       haven_2.2.0 

Any ideas??

agilebean avatar May 19 '20 09:05 agilebean

@agilebean I am not sure if clean_tmpfiles_mod() is correctly replacing the original clean_tmpfiles(). It should echo "Calling clean_tmpfiles_mod()". Make sure you start with a clean environment and try my example first.

gorgitko avatar May 19 '20 10:05 gorgitko

I continue to get random messages about files not existing.

dcaud avatar Dec 11 '20 15:12 dcaud

Some version of an insistently retrying function may work (although clearly not ideal, it seems to be helping me avoid this somewhat random error):

library(purrr)
rate <- rate_backoff(pause_base = 0.1, pause_min = 0.005, max_times = 10)
insistent_render <- insistently(rmarkdown::render, rate, quiet = FALSE)

then call insistent_render instead of rmarkdown::render

dcaud avatar Dec 12 '20 06:12 dcaud

Any suggestions on how to fix this? I am running it in R version 4.04

pandoc.exe: \: openBinaryFile: does not exist (No such file or directory) Warning: Error in : pandoc document conversion failed with error 1 128: stop 127: pandoc_convert 126: convert 125: render 124: discover_rmd_resources 123: find_external_resources 122: copy_render_intermediates 121: output_format$intermediates_generator 120: <Anonymous> 115: 99: doc 98: renderUI 97: func 84: renderFunc 83: output$reactivedoc 3: <Anonymous> 1: rmarkdown::run

ktd2001 avatar Mar 17 '21 15:03 ktd2001

@ktd2001 Which OS are you on ? Which Pandoc version do you use ? rmarkdown::pandoc_version() Where are you files located ? On a network drive ?

Can you share the file with the issue so we can try reproduce ? With any other element that could help us understand ?

cderv avatar Mar 17 '21 17:03 cderv

Hi Chris, Thank you for responding so quickly. I am using a Window 10 Here is the file and dataset that is located in the same folder as the NB.

I could not find which Pandoc version I use ? rmarkdown::pandoc_version(). How would I find this? I did a search in the files but it continues to search.

My best, Keiana

On Wed, Mar 17, 2021 at 1:15 PM Christophe Dervieux < @.***> wrote:

@ktd2001 https://github.com/ktd2001 Which OS are you on ? Which Pandoc version do you use ? rmarkdown::pandoc_version() Where are you files located ? On a network drive ?

Can you share the file with the issue so we can try reproduce ? With any other element that could help us understand ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-801260371, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKK2EYJNTCQMPPLQDNY3NITTEDPQTANCNFSM4IRBJKFA .

ktd2001 avatar Mar 18 '21 03:03 ktd2001

@cderv I am just curious if anyone at RStudio has looked into the problem I proposed in the https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-545824711

TLDR: Allow users to specify whether clean_tmpfiles() will be run after render() has finished. Possibly allow users to run it manually.

Anyway, I think each rendered Rmd should get its own directory in tempdir(); that would basically avoid this problem.

Thank you for looking at this :slightly_smiling_face:

gorgitko avatar Mar 18 '21 07:03 gorgitko

I cleared the temporary files and still getting same error message:

pandoc.exe: \: openBinaryFile: does not exist (No such file or directory) Warning: Error in : pandoc document conversion failed with error 1 128: stop 127: pandoc_convert 126: convert 125: render 124: discover_rmd_resources 123: find_external_resources 122: copy_render_intermediates 121: output_format$intermediates_generator 120: <Anonymous> 115: 99: doc 98: renderUI 97: func 84: renderFunc 83: output$reactivedoc 3: <Anonymous> 1: rmarkdown::run

On Thu, Mar 18, 2021 at 3:57 AM Jiri Novotny @.***> wrote:

@cderv https://github.com/cderv I am just curious if anyone at RStudio has looked into the problem I proposed in the #1632 (comment) https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-545824711

TLDR: Allow users to specify whether clean_tmpfiles() will be run after render() has finished. Possibly allow users to run it manually.

Anyway, I think each rendered Rmd should get its own directory in tempdir(); that would basically avoid this problem.

Thank you for looking at this 🙂

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-801712631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKK2EYIOCFTKTR4SFFH53K3TEGW5PANCNFSM4IRBJKFA .

ktd2001 avatar Mar 18 '21 13:03 ktd2001

@gorgitko I tried your solution, but I still get the error intermittently.

Note that this works correctly in my local RStudio session, but not always in a Docker container for Continuous Integration on GitLab with rocker/geospatial If I re-run the same CI, it works again.

In addition: Warning message:
In readLines(con, warn = FALSE) :
  cannot open file 'C03-planning.utf8.md': No such file or directory
Calling clean_tmpfiles_mod()
Execution halted
Calling clean_tmpfiles_mod()
pandoc: C01-bonjour.utf8.md: openBinaryFile: does not exist (No such file or directory)
Error: pandoc document conversion failed with error 1

Note that I created a R file called to run in parallel. The R file looks like this:

knit_file_system <- function(x) {
runR <- tempfile(fileext = "run.R")

cat(
    paste0(
      paste0(".libPaths(c(\"", paste(.libPaths(), collapse = "\", \""), "\"));"),
      # Avoid https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-545824711
      '
      library(rmarkdown)
      clean_tmpfiles_mod <- function() {
        message("Calling clean_tmpfiles_mod()")
      }
      assignInNamespace("clean_tmpfiles", clean_tmpfiles_mod, ns = "rmarkdown")
      ',
      # 'rmarkdown::render(',
      'render(',
            gsub("\\", "\\\\", .x, fixed = TRUE),
            'envir = new.env(), encoding = "UTF-8"',  #parent = baseenv()
      ')'
    )
    , file = runR)

    system(
      paste(normalizePath(file.path(Sys.getenv("R_HOME"), "bin", "Rscript"), mustWork = FALSE), runR)
    )
}

Then call it with {future}

   library(future)
   future::plan(future::multicore)

   all_my_rmds <- list.files(pattern = "[.]Rmd")
      future_imap(all_my_rmds,
        ~try(knit_file_system(.x)),
        .progress = TRUE)

statnmap avatar Mar 22 '21 13:03 statnmap

@statnmap does it happens also if you explicitly run each render() in a new session ? e.g using xfun::Rscript_call or callr::r() ?

If the issue is really with each render having its own tempdir() this could solve it.

I am thinking more and more of providing a way to run a render() in a new session (for example, rmarkdown::render(..., new_session = TRUE)) to mimic the knit button - Parallel use of render could be another case in favor it is works better this way. 🤔

cderv avatar Mar 22 '21 13:03 cderv

I use the system() command to run a new session. I updated my code above for this missing part.

statnmap avatar Mar 22 '21 13:03 statnmap

@cderv

If the issue is really with each render having its own tempdir() this could solve it.

This is definitely the issue (i.e. common tempdir() for all render() calls evaluated in the same R session), but controlling the execution of clean_tmpfiles() from the user side will definitely bring less overhead than starting a new session. Alternatively, creating a random-named tempdir inside tempdir() will also solve this (or at least the user could have the opportunity to do that).

gorgitko avatar Mar 22 '21 14:03 gorgitko

Now I know why I still have this problem, this is because I knit twice the same file in parallel, such that one process deletes the utf8.md file while the other process is trying to access it.

statnmap avatar Mar 22 '21 17:03 statnmap

@statnmap That's why I am using an unique intermediate_dir for each render() call in my code snippet:

intermediates_dir <- glue("{i}_intermediates_dir")
render(..., intermediates_dir = intermediates_dir)
# Cleaning.
system(glue("rm -r {intermediates_dir}"))

It's also possible to use completely random intermediate dirs, e.g.:

paste0("intermediates_", stringi::stri_rand_strings(1, 10))
[1] "intermediates_HaPxZbAKXY"

gorgitko avatar Mar 23 '21 06:03 gorgitko

FYI this issue caused a major issue with some work products - rendering a bunch of rmarkdown to latex to pandoc in parallel caused some of the files to have a file name that didn't match the file content. Until this is fixed, there should be a warning or error if it is run in parallel.

Using @gorgitko's method to create unique intermediate directories solved the issue, so the fix should implement this.

jzadra avatar Sep 20 '22 20:09 jzadra