knitr icon indicating copy to clipboard operation
knitr copied to clipboard

Error when purl-ing a file that contains read_chunk(extra parentheses)

Open louisahsmith opened this issue 5 years ago • 9 comments

I have a document test.Rmd which consists only of:

```{r}
knitr::read_chunk(here::here("code.R"))
```

(code.R exists but could be empty). When I run r knitr::purl("test.Rmd") I get:

processing file: test.Rmd
  |.................................................................| 100%
Quitting from lines 2-3 (test.Rmd) 
Error in parse(text = code, keep.source = FALSE) : 
  <text>:2:0: unexpected end of input
1: read_chunk(here::here("code.R")
   ^

When the test.Rmd file consists of knitr::read_chunk("code.R"), it runs with no problem.

The problem seems to be the process_tangle.block function, specifically the stringr::str_extract_all(code, 'read_chunk\\(([^)]+)\\)') line, since it only extracts the first ). I don't know enough to know if 'read_chunk\\(([^)]+)\\)+' would fix it or break anything else....

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2019-08-30                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                      
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)              
#>  backports     1.1.4      2019-04-10 [1] CRAN (R 3.6.0)              
#>  callr         3.3.1      2019-07-18 [1] CRAN (R 3.6.0)              
#>  cli           1.1.0      2019-03-19 [1] CRAN (R 3.6.0)              
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.6.0)              
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.6.0)              
#>  devtools      2.1.0      2019-07-06 [1] CRAN (R 3.6.0)              
#>  digest        0.6.20     2019-07-04 [1] CRAN (R 3.6.0)              
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)              
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)              
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)              
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.0)              
#>  htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.6.0)              
#>  knitr         1.24.4     2019-08-30 [1] Github (yihui/knitr@52edc22)
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.0)              
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)              
#>  pkgbuild      1.0.5      2019-08-26 [1] CRAN (R 3.6.0)              
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)              
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.6.0)              
#>  processx      3.4.1      2019-07-18 [1] CRAN (R 3.6.0)              
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)              
#>  R6            2.4.0      2019-02-14 [1] CRAN (R 3.6.0)              
#>  Rcpp          1.0.2      2019-07-25 [1] CRAN (R 3.6.0)              
#>  remotes       2.1.0      2019-06-24 [1] CRAN (R 3.6.0)              
#>  rlang         0.4.0.9002 2019-08-17 [1] Github (r-lib/rlang@09fbc86)
#>  rmarkdown     1.15       2019-08-21 [1] CRAN (R 3.6.0)              
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.6.0)              
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)              
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)              
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)              
#>  testthat      2.2.1      2019-07-25 [1] CRAN (R 3.6.0)              
#>  usethis       1.5.1      2019-07-04 [1] CRAN (R 3.6.0)              
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.6.0)              
#>  xfun          0.9        2019-08-21 [1] CRAN (R 3.6.0)              
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.6.0)              
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

By filing an issue to this repo, I promise that

  • [x] I have fully read the issue guide at https://yihui.name/issue/.
  • [x] I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('knitr'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('yihui/knitr').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • [x] I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

louisahsmith avatar Aug 30 '19 22:08 louisahsmith

I guess it will work if you don't use here::here() but just use a relative path.

yihui avatar Aug 31 '19 03:08 yihui

Yes, it will work with relative paths but throws an error anytime something is in there with parentheses. Is the best option then to just add knitr::opts_knit$set(root.dir = "") with my project directory to all my files and then use relative paths?

louisahsmith avatar Sep 01 '19 22:09 louisahsmith

Yes, it will work with relative paths but throws an error anytime something is in there with parentheses. Is the best option then to just add knitr::opts_knit$set(root.dir = "") with my project directory to all my files and then use relative paths?

I had this same problem, but was able to use here("") by storing the result of here in a local variable and then passing that to read_chunk, i.e.

MyPath<-here("your/relative/path/YourScript.R") read_chunk(MyPath)

This will allows for more general reproducibility AND extracting the R code from the Rmd.

RussellSteele avatar Oct 22 '20 13:10 RussellSteele

The root cause of this issue appears to be that purl() uses a heuristic string-matching method for processing calls to "read_chunk()". It is treating the chunk code as a string and pattern matching on the string, instead of parsing the chunk text as code and then pattern matching on the code syntax tree.

If you look at line 8 of the traceback in the reprex below you will see a regular expression that looks for the string "read_chunk(" and captures the immediately following string from the "(" immediately after "read_chunk" up to the first following occurrence of ")". It then attempts to parse this string.

Unfortunately, that captured parenthesized string is not guaranteed to be syntactically well-formed, even if the original code in the chunk is syntactically well-formed. For example, in the reprex below:

  1. The first ")" after "read_chunk(" is the right parenthesis in the character argument "lines = ')' ".
  2. The function call is actually "fake_read_chunk()" not "read_chunk()". The heuristic doesn't care, it is looking for any string containing "read_chunk(".
  3. The line of code containing "read_chunk(" is actually commented out. Again, the heuristic doesn't care, it is looking for any string containing "read_chunk(" even if it's not executable code.

I presume that to avoid these issues, purl() would have to parse the chunk text as code first, and then search the syntax tree generated from the code.

tmp_rmd <- tempfile()
lines_rmd <- c(
  "---",
  "title: 'test report'",
  "output: html_document",
  "---",
  "",
  "```{r cache = FALSE}",
  "fake_read_chunk <- function(lines){'Hi!'}",
  "# fake_read_chunk(lines = ')')",
  "```"
)
writeLines(lines_rmd, tmp_rmd)
knitr::purl(tmp_rmd, output = stdout())
#> processing file: /tmp/RtmpzV0wxT/file60f315b31793
#> Quitting from lines 7-9 (/tmp/RtmpzV0wxT/file60f315b31793)
#> Error in parse(text = code, keep.source = FALSE): <text>:1:20: unexpected INCOMPLETE_STRING
#> 1: read_chunk(lines = ')
#>                        ^
raceback()
#> 9: parse(text = code, keep.source = FALSE)
#> 8: parse_only(unlist(stringr::str_extract_all(code, "read_chunk\\(([^)]+)\\)")))
#> 7: eval(parse_only(unlist(stringr::str_extract_all(code, "read_chunk\\(([^)]+)\\)"))))
#> 6: process_tangle.block(group)
#> 5: process_tangle(group)
#> 4: withCallingHandlers(if (tangle) process_tangle(group) else process_group(group), 
#>                        error = function(e) {
#>                         setwd(wd)
#>                          cat(res, sep = "\n", file = output %n% "")
#>                          message("Quitting from lines ", paste(current_lines(i), 
#>                                                                collapse = "-"), " (", knit_concord$get("infile"), 
#>                                  ") ")
#>                        })
#> 3: process_file(text, output)
#> 2: knit(..., tangle = TRUE)
#> 1: knitr::purl(tmp_rmd, output = stdout())

The biggest issue here is that syntactically correct chunk code can break purl().

This is a problem for other software that relies on purl(). For example, tarchetypes::tar_knitr_deps() uses purl() to extract the R code from Rmarkdown documents so that it can look for dependencies that it needs to monitor. For example, #https://github.com/ropensci/tarchetypes/issues/51

rgayler avatar May 16 '21 11:05 rgayler

@rgayler Your diagnosis was completely correct. I'm not sure what the best solution would be.

For the case of fake_read_chunk(), it is easy to fix---I can use the regex \\bread_chunk, i.e., make sure there is a word boundary before read_chunk, so it won't match fake_read_chunk.

For the wrong closing parenthesis ), regex can't help.

I wonder adding \b to the regex and filtering out comment lines would be enough to solve your original problem.

yihui avatar May 17 '21 15:05 yihui

@yihui thanks for your rapid response.

I think I had two problems:

  1. My code broke purl() (because I assumed, incorrectly, that the chunk code was being syntactically analysed as code).
  2. I was asking purl() to do something it can't do (execute expressions in the code).

I wonder adding \b to the regex and filtering out comment lines would be enough to solve your original problem.

Yes - I think that 99.999% solves problem 1. (The missing 0.001% is because maybe there's some exotic edge-case syntactically correct code that the regex doesn't find.)

I know you have gone to some lengths to emphasise that executing the purled code is not guaranteed to be identical to knitting the same document. However, problem 2 is different and, I think, not mentioned in the documentation. I think it would be very helpful if you added to the help pages of read_chunk() (and any other similar special cases) and purl(), some text like:

Warning

purl() only extracts the code from the chunks and does not execute any part of that code. purl() attempts to read in any external scripts that knit() would access by read_chunk() and <list other special cases, e.g. source(), here>. knit() can execute an arbitrary expression to yield a file path to the external script to be included. However, because purl() does not execute the extracted code, the path argument pointing to an external script to be included must be a literal string constant. If your document contains a call to read_chunk() with the path argument being an expression, rather than a string constant, purl() will definitely not include the external script. Depending on the expression, purl() may also generate an uninformative error message because of the way it processes these external scripts as special cases.

rgayler avatar May 17 '21 22:05 rgayler

For the wrong closing parenthesis ), regex can't help.

@yihui I have a suggestion on how to fix the closing parenthesis being missed.

In the second line here,

https://github.com/yihui/knitr/blob/77970b0717a9497b5ba275689a21c0af2e58ab07/R/block.R#L590-L591

rather than using the regex read_chunk\\(([^)]+)\\) with str_extract to get the arguments of read_chunk, an option might be to use the regex from the line before that, read_chunk\\(.+\\), and then manually re-build the read_chunk call.

So, replace this line in process_tangle.block

https://github.com/yihui/knitr/blob/77970b0717a9497b5ba275689a21c0af2e58ab07/R/block.R#L591

with

  eval(parse_only(
    paste0(
      "read_chunk(", 
      str_sub(unlist(str_extract(code, '\\bread_chunk\\(.+\\)')), start = 12, end = -2), 
      ")"
    )
  ))

Basically, just manually drop read_chunk( from the start and ) from the end of each string returned by the regex, and then add them in again using paste0. Since we are reconstructing the same call, this shouldn't break anything (edge cases)?

Caveat: this still has issues dealing with comments. Both of these cause issues still:

read_chunk(lines = "hello") # don't use other_function()
# read_chunk(lines = "hello")

You spoke above about stripping out comments, but I'm not sure how you would do that (with a regex too?):

I wonder adding \b to the regex and filtering out comment lines would be enough to solve your original problem.

If this works, it means path calls using here::here() should also work (and this solves the same problem that rmarkdown::render() also has).

mrdowdeswell avatar Feb 12 '23 11:02 mrdowdeswell

read_chunk(lines = "hello") # don't use other_function()
# read_chunk(lines = "hello")

Using (?<!(#\\s)|(#))\\bread_chunk\\(.+\\) as the regex deals with the second of these. The problem with the first one is the appearance of parentheses, specifically ), anywhere in the comment.

And, this is also a (comment) problem still:

# don't call read_chunk("file/path")

And perhaps assignment is broken too :-/

rc <- read_chunk("file/path")

since the assignment won't appear in the eval() (using either method). This seems like a non-use-case though.

mrdowdeswell avatar Feb 12 '23 12:02 mrdowdeswell

@mrdowdeswell Please feel free to submit a pull request. You don't have to solve all problems above. I think it'll be great even if we can only fix the original problem at the top. For other cases, we can try out best, but I guess the solution won't be totally robust. Any improvement is better than none. Thanks!

yihui avatar Feb 14 '23 00:02 yihui