knitr
knitr copied to clipboard
Error when purl-ing a file that contains read_chunk(extra parentheses)
I have a document test.Rmd
which consists only of:
```{r}
knitr::read_chunk(here::here("code.R"))
```
(code.R
exists but could be empty). When I run r knitr::purl("test.Rmd")
I get:
processing file: test.Rmd
|.................................................................| 100%
Quitting from lines 2-3 (test.Rmd)
Error in parse(text = code, keep.source = FALSE) :
<text>:2:0: unexpected end of input
1: read_chunk(here::here("code.R")
^
When the test.Rmd
file consists of knitr::read_chunk("code.R")
, it runs with no problem.
The problem seems to be the process_tangle.block
function, specifically the stringr::str_extract_all(code, 'read_chunk\\(([^)]+)\\)')
line, since it only extracts the first )
. I don't know enough to know if 'read_chunk\\(([^)]+)\\)+'
would fix it or break anything else....
Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.0 (2019-04-26)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2019-08-30
#>
#> ─ Packages ──────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
#> callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
#> cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
#> digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.0)
#> knitr 1.24.4 2019-08-30 [1] Github (yihui/knitr@52edc22)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> pkgbuild 1.0.5 2019-08-26 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> rlang 0.4.0.9002 2019-08-17 [1] Github (r-lib/rlang@09fbc86)
#> rmarkdown 1.15 2019-08-21 [1] CRAN (R 3.6.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.9 2019-08-21 [1] CRAN (R 3.6.0)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
By filing an issue to this repo, I promise that
- [x] I have fully read the issue guide at https://yihui.name/issue/.
- [x] I have provided the necessary information about my issue.
- If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
- If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included
xfun::session_info('knitr')
. I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version:remotes::install_github('yihui/knitr')
. - If I have posted the same issue elsewhere, I have also mentioned it in this issue.
- [x] I have learned the Github Markdown syntax, and formatted my issue correctly.
I understand that my issue may be closed if I don't fulfill my promises.
I guess it will work if you don't use here::here()
but just use a relative path.
Yes, it will work with relative paths but throws an error anytime something is in there with parentheses. Is the best option then to just add knitr::opts_knit$set(root.dir = "")
with my project directory to all my files and then use relative paths?
Yes, it will work with relative paths but throws an error anytime something is in there with parentheses. Is the best option then to just add
knitr::opts_knit$set(root.dir = "")
with my project directory to all my files and then use relative paths?
I had this same problem, but was able to use here("
MyPath<-here("your/relative/path/YourScript.R")
read_chunk(MyPath)
This will allows for more general reproducibility AND extracting the R code from the Rmd.
The root cause of this issue appears to be that purl()
uses a heuristic string-matching method for processing calls to "read_chunk()". It is treating the chunk code as a string and pattern matching on the string, instead of parsing the chunk text as code and then pattern matching on the code syntax tree.
If you look at line 8 of the traceback in the reprex below you will see a regular expression that looks for the string "read_chunk(" and captures the immediately following string from the "(" immediately after "read_chunk" up to the first following occurrence of ")". It then attempts to parse this string.
Unfortunately, that captured parenthesized string is not guaranteed to be syntactically well-formed, even if the original code in the chunk is syntactically well-formed. For example, in the reprex below:
- The first ")" after "read_chunk(" is the right parenthesis in the character argument "lines = ')' ".
- The function call is actually "fake_read_chunk()" not "read_chunk()". The heuristic doesn't care, it is looking for any string containing "read_chunk(".
- The line of code containing "read_chunk(" is actually commented out. Again, the heuristic doesn't care, it is looking for any string containing "read_chunk(" even if it's not executable code.
I presume that to avoid these issues, purl()
would have to parse the chunk text as code first, and then search the syntax tree generated from the code.
tmp_rmd <- tempfile()
lines_rmd <- c(
"---",
"title: 'test report'",
"output: html_document",
"---",
"",
"```{r cache = FALSE}",
"fake_read_chunk <- function(lines){'Hi!'}",
"# fake_read_chunk(lines = ')')",
"```"
)
writeLines(lines_rmd, tmp_rmd)
knitr::purl(tmp_rmd, output = stdout())
#> processing file: /tmp/RtmpzV0wxT/file60f315b31793
#> Quitting from lines 7-9 (/tmp/RtmpzV0wxT/file60f315b31793)
#> Error in parse(text = code, keep.source = FALSE): <text>:1:20: unexpected INCOMPLETE_STRING
#> 1: read_chunk(lines = ')
#> ^
raceback()
#> 9: parse(text = code, keep.source = FALSE)
#> 8: parse_only(unlist(stringr::str_extract_all(code, "read_chunk\\(([^)]+)\\)")))
#> 7: eval(parse_only(unlist(stringr::str_extract_all(code, "read_chunk\\(([^)]+)\\)"))))
#> 6: process_tangle.block(group)
#> 5: process_tangle(group)
#> 4: withCallingHandlers(if (tangle) process_tangle(group) else process_group(group),
#> error = function(e) {
#> setwd(wd)
#> cat(res, sep = "\n", file = output %n% "")
#> message("Quitting from lines ", paste(current_lines(i),
#> collapse = "-"), " (", knit_concord$get("infile"),
#> ") ")
#> })
#> 3: process_file(text, output)
#> 2: knit(..., tangle = TRUE)
#> 1: knitr::purl(tmp_rmd, output = stdout())
The biggest issue here is that syntactically correct chunk code can break purl()
.
This is a problem for other software that relies on purl()
. For example, tarchetypes::tar_knitr_deps()
uses purl()
to extract the R code from Rmarkdown documents so that it can look for dependencies that it needs to monitor. For example, #https://github.com/ropensci/tarchetypes/issues/51
@rgayler Your diagnosis was completely correct. I'm not sure what the best solution would be.
For the case of fake_read_chunk()
, it is easy to fix---I can use the regex \\bread_chunk
, i.e., make sure there is a word boundary before read_chunk
, so it won't match fake_read_chunk
.
For the wrong closing parenthesis )
, regex can't help.
I wonder adding \b
to the regex and filtering out comment lines would be enough to solve your original problem.
@yihui thanks for your rapid response.
I think I had two problems:
- My code broke
purl()
(because I assumed, incorrectly, that the chunk code was being syntactically analysed as code). - I was asking
purl()
to do something it can't do (execute expressions in the code).
I wonder adding \b to the regex and filtering out comment lines would be enough to solve your original problem.
Yes - I think that 99.999% solves problem 1. (The missing 0.001% is because maybe there's some exotic edge-case syntactically correct code that the regex doesn't find.)
I know you have gone to some lengths to emphasise that executing the purled code is not guaranteed to be identical to knitting the same document. However, problem 2 is different and, I think, not mentioned in the documentation. I think it would be very helpful if you added to the help pages of read_chunk()
(and any other similar special cases) and purl()
, some text like:
Warning
purl()
only extracts the code from the chunks and does not execute any part of that code.purl()
attempts to read in any external scripts thatknit()
would access byread_chunk()
and <list other special cases, e.g.source()
, here>.knit()
can execute an arbitrary expression to yield a file path to the external script to be included. However, becausepurl()
does not execute the extracted code, the path argument pointing to an external script to be included must be a literal string constant. If your document contains a call toread_chunk()
with thepath
argument being an expression, rather than a string constant,purl()
will definitely not include the external script. Depending on the expression,purl()
may also generate an uninformative error message because of the way it processes these external scripts as special cases.
For the wrong closing parenthesis
)
, regex can't help.
@yihui I have a suggestion on how to fix the closing parenthesis being missed.
In the second line here,
https://github.com/yihui/knitr/blob/77970b0717a9497b5ba275689a21c0af2e58ab07/R/block.R#L590-L591
rather than using the regex read_chunk\\(([^)]+)\\)
with str_extract
to get the arguments of read_chunk
, an option might be to use the regex from the line before that, read_chunk\\(.+\\)
, and then manually re-build the read_chunk
call.
So, replace this line in process_tangle.block
https://github.com/yihui/knitr/blob/77970b0717a9497b5ba275689a21c0af2e58ab07/R/block.R#L591
with
eval(parse_only(
paste0(
"read_chunk(",
str_sub(unlist(str_extract(code, '\\bread_chunk\\(.+\\)')), start = 12, end = -2),
")"
)
))
Basically, just manually drop read_chunk(
from the start and )
from the end of each string returned by the regex, and then add them in again using paste0
. Since we are reconstructing the same call, this shouldn't break anything (edge cases)?
Caveat: this still has issues dealing with comments. Both of these cause issues still:
read_chunk(lines = "hello") # don't use other_function()
# read_chunk(lines = "hello")
You spoke above about stripping out comments, but I'm not sure how you would do that (with a regex too?):
I wonder adding
\b
to the regex and filtering out comment lines would be enough to solve your original problem.
If this works, it means path calls using here::here()
should also work (and this solves the same problem that rmarkdown::render()
also has).
read_chunk(lines = "hello") # don't use other_function() # read_chunk(lines = "hello")
Using (?<!(#\\s)|(#))\\bread_chunk\\(.+\\)
as the regex deals with the second of these. The problem with the first one is the appearance of parentheses, specifically )
, anywhere in the comment.
And, this is also a (comment) problem still:
# don't call read_chunk("file/path")
And perhaps assignment is broken too :-/
rc <- read_chunk("file/path")
since the assignment won't appear in the eval()
(using either method). This seems like a non-use-case though.
@mrdowdeswell Please feel free to submit a pull request. You don't have to solve all problems above. I think it'll be great even if we can only fix the original problem at the top. For other cases, we can try out best, but I guess the solution won't be totally robust. Any improvement is better than none. Thanks!