rmarkdown icon indicating copy to clipboard operation
rmarkdown copied to clipboard

Argument keep_md not inherited by custom output format that extends html_document()

Open jdblischak opened this issue 6 years ago • 3 comments

By filing an issue to this repo, I promise that

  • [x] I have fully read the issue guide at https://yihui.name/issue/.
  • [x] I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('rmarkdown'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('rstudio/rmarkdown').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • [x] I have learned the Github Markdown syntax, and formatted my issue correctly.

Note: Sorry for the strange line wrapping below. I used the output formatgithub_document(), but apparently GitHub markdown now respects line breaks. I manually fixed a few of the more egregious breaks below.


I have created a custom output format that extends html_document(). I noticed that the argument keep_md is not properly inherited.

My first thought was that both the arguments keep_md and df_print might have issues since they are arguments for both output_format() and html_document(), but somehow df_print is properly inherited.

intersect(names(formals(rmarkdown::output_format)),
          names(formals(rmarkdown::html_document)))
## [1] "keep_md"  "df_print"

Here is a minimal example to demonstrate the issue. The function below is a trivial output format that wraps html_document(), but doesn’t change anything.

# A trivial wrapper for html_document()
html_wrap <- function(...) {
  rmarkdown::output_format(knitr = rmarkdown::knitr_options(),
                           pandoc = rmarkdown::pandoc_options(to = "html"),
                           base_format = rmarkdown::html_document(...))
}

If a user runs render() on an R Markdown file with the following YAML header:

---
output: html_wrap
---

then this would result in the following call to create_output_format():

o1 <- rmarkdown:::create_output_format(name = "html_wrap",
                                       options = list())

This inherits the default values from html_document() as expected:

environment(environment(o1[["pre_knit"]])[["overlay"]])[["toc"]]
## [1] FALSE
o1$keep_md
## [1] FALSE
environment(environment(o1[["pre_knit"]])[["overlay"]])[["keep_md"]]
## [1] FALSE
o1$df_print
## [1] "default"
environment(environment(o1[["pre_knit"]])[["overlay"]])[["df_print"]]
## [1] "default"

However, if a user runs render() on an R Markdown file with the following YAML header:

---
output:
  html_wrap:
    toc: true
    keep_md: TRUE
    df_print: "kable"
---

then this would result in the following call to create_output_format() (line 378):

o2 <- rmarkdown:::create_output_format(name = "html_wrap",
                                       options = list(toc = TRUE,
                                                      keep_md = TRUE,
                                                      df_print = "kable"))

toc is inherited as expected:

environment(environment(o2[["pre_knit"]])[["overlay"]])[["toc"]]
## [1] TRUE

df_print is inherited not only in the environment of the pre_knit() function, but the value of the output format is also inherited:

o2$df_print
## [1] "kable"
environment(environment(o2[["pre_knit"]])[["overlay"]])[["df_print"]]
## [1] "kable"

Unfortunately, keep_md is only updated in the environment of the pre_knit() function, where it has no effect.

o2$keep_md
## [1] FALSE
environment(environment(o2[["pre_knit"]])[["overlay"]])[["keep_md"]]
## [1] TRUE

When render() adds rmarkdown.keep_md to knitr::opts_knit, it uses output_format$keep_md (line 500). When render() decides to keep the md file, it again uses output_format$keep_md (line 869).

Potentially related Issues include #842 and #928.

Note that bookdown::html_document2() does not have this issue.

rmarkdown:::create_output_format(name = "bookdown::html_document2",
                                 options = list(keep_md = TRUE))$keep_md
## [1] TRUE

But it is unclear to me what part of its code would change the inheritance of keep_md.

bookdown::html_document2
## function (..., number_sections = TRUE, pandoc_args = NULL, base_format = rmarkdown::html_document) 
## {
##     base_format = get_base_format(base_format)
##     config = base_format(..., number_sections = number_sections, 
##         pandoc_args = pandoc_args2(pandoc_args))
##     post = config$post_processor
##     config$post_processor = function(metadata, input, output, 
##         clean, verbose) {
##         if (is.function(post)) 
##             output = post(metadata, input, output, clean, verbose)
##         x = read_utf8(output)
##         x = restore_appendix_html(x, remove = FALSE)
##         x = restore_part_html(x, remove = FALSE)
##         x = resolve_refs_html(x, global = !number_sections)
##         x = clean_pandoc2_highlight_tags(x)
##         write_utf8(x, output)
##         output
##     }
##     config$bookdown_output_format = "html"
##     config = set_opts_knit(config)
##     config
## }
## <bytecode: 0x558e41c8af40>
## <environment: namespace:bookdown>

It appears to be directly editing the base output format instead of inheriting from it using output_format(base_format = ), which is what the documentation recommends.

What is the recommended method for passing all arguments provided to a custom output format to the base output format that it extends?

I realize that I could hard-code a work-around for the custom output format to handle keep_md, but this is fragile. If a future release of rmarkdown added a new argument to output_format()/html_document(), I would have to update my code and also put a restriction on the minimum version of rmarkdown that is compatible.

html_wrap2 <- function(...) {
  
  opts <- list(...)
  
  if (!is.null(opts$keep_md)) keep_md <- opts$keep_md else keep_md <- FALSE
  
  rmarkdown::output_format(knitr = rmarkdown::knitr_options(),
                           pandoc = rmarkdown::pandoc_options(to = "html"),
                           keep_md = keep_md,
                           base_format = rmarkdown::html_document(...))
}
rmarkdown:::create_output_format(name = "html_wrap2",
                                 options = list(keep_md = TRUE))$keep_md
## [1] TRUE

jdblischak avatar Mar 27 '19 18:03 jdblischak

Hi @jdblischak, sorry for the delay.

keep_md and clean_supporting have default value in output_format(). If you don't change this value, this will explicitly says to not use the one from base format. The default value of output_format() will be merged with the one for base_format as, will have precedence. keep_md is FALSE by default in output_format

html_wrap <- function(...) {
  rmarkdown::output_format(knitr = rmarkdown::knitr_options(),
                           pandoc = rmarkdown::pandoc_options(to = "html"),
                           base_format = rmarkdown::html_document(...))
}

html_wrap()$keep_md
#> [1] FALSE
html_wrap(keep_md = TRUE)$keep_md
#> [1] FALSE

However, you can set the value to NULL to explicitly use the one from base_format(),

html_wrap <- function(...) {
  rmarkdown::output_format(knitr = rmarkdown::knitr_options(),
                           pandoc = rmarkdown::pandoc_options(to = "html"),
                           keep_md = NULL,
                           base_format = rmarkdown::html_document(...))
}

html_wrap()$keep_md
#> [1] FALSE
html_wrap(keep_md = TRUE)$keep_md
#> [1] TRUE

Currently there is also the option to do as some rmarkdown format by providing explictly the argument

html_wrap <- function(keep_md = FALSE, ...) {
  rmarkdown::output_format(knitr = rmarkdown::knitr_options(),
                           keep_md = keep_md,
                           pandoc = rmarkdown::pandoc_options(to = "html"),
                           base_format = rmarkdown::html_document(...))
}

html_wrap()$keep_md
#> [1] FALSE
html_wrap(keep_md = TRUE)$keep_md
#> [1] TRUE

or as some of the bookdown format, by modifying directly a format (this way you can decide the merging rule too)

html_wrap <- function(...) {
  base_format <- rmarkdown::html_document(...)
  base_format$knitr <- rmarkdown::knitr_options()
  base_format$pandoc <- rmarkdown::pandoc_options(to = "html")
  base_format
}

html_wrap()$keep_md
#> [1] FALSE
html_wrap(keep_md = TRUE)$keep_md
#> [1] TRUE

This is the current state of how this works.

I understand this is not what you expected. I guess the default value in output_format() are disturbing, and if not provided then this would be equivalent to them being NULL, as some of the other argument.

If we change the current default behavior (by inheriting from base format if keep_md not explictly provided), this may have consequences on existing format that we need to assess.

Do you still find it is not suitable from the example above ? Thanks for the feedback and again sorry for the delay.

cderv avatar Oct 20 '20 11:10 cderv

@jdblischak what are your thoughts on the above ? I believe there is solution for what you are looking but they may require better documentation if it was not obvious for you when trying to create a new output format.

cderv avatar Dec 07 '20 15:12 cderv

@cderv Thanks for diving into this complicated issue and following up.

Ultimately I think this is a documentation issue. Some arguments to html_document(), e.g. keep_md, are not inherited like the rest. Looking at ?html_document, ?output_format, and Chapter 18 Creating New Formats from the R Markdown book, there is no indication that the arguments can behave differently. From section 18.2:

You can also pass a base_format to the output_format() function if you want to inherit all of the behavior of an existing format but tweak a subset of its options.

Thus currently the only way to learn that keep_md isn't inherited is the hard way.

For my purposes, I am fine with the workaround I implemented for workflowr (which is essentially what I proposed above). My main concern would be if new arguments were added to html_document() that were also not inherited. I love how the output formats can inherit features, because it allows workflowr users to take advantage of any feature added to html_document().

Of the options you proposed above, I think the most straightforward option is to explicitly supply the argument keep_md. Thus I recommend that this be the solution that is added to the documentation.

jdblischak avatar Dec 08 '20 01:12 jdblischak