rmarkdown icon indicating copy to clipboard operation
rmarkdown copied to clipboard

Allow `output_file` to be a function in `render()` [FR]

Open mikmart opened this issue 3 years ago • 8 comments

It would be useful to have an easy way to reuse metadata included in the YAML front matter of an RMarkdown file for generating output file names when rendering. This originally came up on StackOverflow.

At the moment it's simple enough to do with yaml_front_matter() if the metadata doesn't include R expressions. Just extract the metadata from the RMarkdown file, and build the output file name.

However, it becomes quite a bit more complicated if the metadata fields contain expressions that need to be rendered first. One way to do that is to keep the intermediate Markdown file, and extract the metadata from there. For example:

---
title: "Untitled"
author: "Jane Doe"
date: "`r Sys.Date()`"
output: word_document
knit: >
  (function(input_file, encoding) {
    # Render, keeping intermediate files for extracting front matter
    md_dir <- tempdir()
    output_file_temp <- rmarkdown::render(
      input = input_file,
      output_file = tempfile(),
      intermediates_dir = md_dir,
      clean = FALSE
    )
    
    # Get the rendered front matter from the intermediate Markdown file
    md_file <- fs::path_ext_set(fs::path_file(input_file), ".knit.md")
    metadata <- rmarkdown::yaml_front_matter(fs::path(md_dir, md_file))
    
    # Build output file name based on rendered metadata
    output_name <- with(metadata, paste(title, "by", author, "on", date))
    output_ext <- fs::path_ext(output_file_temp)
    output_file <- fs::path_ext_set(output_name, output_ext)
    
    fs::file_move(output_file_temp, output_file)
  })
---

204 No Content

Would it be possible to allow the output_file argument to render() to be a function that gets the rendered metadata as an argument? The above could then become:

---
title: "Untitled"
author: "Jane Doe"
date: "`r Sys.Date()`"
output: word_document
knit: >
  (function(input_file, encoding) {
    rmarkdown::render(
      input = input_file,
      output_file = function(metadata) {
        with(metadata, paste(title, "by", author, "on", date))
      }
    )
  })
---

204 No Content

And the resulting output file would be named Untitled by Jane Doe on 2022-02-18.docx.

mikmart avatar Feb 18 '22 15:02 mikmart

I would suggest define custom format and tweak post_processor with rmarkdown::output_format. In this way, you do not have to write knit in YAML front matter.

atusy avatar Feb 18 '22 16:02 atusy

Oh thanks for pointing that out! post_processor seems like exactly the kind of thing to solve this.

Needing to define a whole new output format seems like a bit much though, considering that this has nothing to do with the actual file format. You'd basically need to write a wrapper format that exposes the post_processor argument for each actual output format that you want to use (docx, pdf, html, etc.).

Maybe there'd be a way to add a function that takes an existing format and adds this functionality...

mikmart avatar Feb 18 '22 16:02 mikmart

That sounds like a reasonable feature request, and is indeed a common request, too. I agree that defining a new output format is too much only for this purpose.

I feel this should not be hard to implement. If anyone wants to submit a pull request, please feel free to. Thanks!

yihui avatar Feb 18 '22 22:02 yihui

Thanks for considering this @yihui! I'd be happy to give a go at making a PR. Hopefully sometime this weekend.

mikmart avatar Feb 19 '22 10:02 mikmart

I've been taking the lay of the land for this, and I've encoutered an issue I'm not sure how to deal with.

In order for output_file to be a function that receives the rendered metadata, we'd need to avoid materializing it until after knitting has been done. For the most part, this isn't an issue. But there are two problematic uses pre-knitting:

  1. If output_file is a path that includes directories, and output_dir hasn't been specified, the directories in output_file are used to determine a output_dir going forward.
  2. Both output_dir and the basename() of the output_file are required to construct the files_dir used for saving images during knitting. If I understand correctly, files_dir needs to be known before knitting, and must remain unchanged after it, since the knitted document can include links that point to it.

I could see the first point being circumvented by e.g. using a temporary output_dir and moving to the final one once the dynamic output file name is known. However, for the second issue with files_dir I can't really see a neat solution.

Some options that I considered and dismissed:

  1. Use input to name files_dir? Would clash if many outputs are rendered from the same input, e.g. parameterized report.
  2. Use a random identifier for files_dir? Would cause an explosion of directories for re-renders.
  3. Use some appropriate hash for files_dir? Would be difficult for a user to determine relation between outputs and dirs.
  4. Use un-rendered metadata to create a output_file used to name files_dir? Most promising, but would at least need to sanitize R expressions to be suitable for use in directory names.

@yihui do you have any thoughts on how we might proceed, in particular with the files_dir issue?

Some relevant code links: https://github.com/rstudio/rmarkdown/blob/72062edf39012ce2e08ff05ece100747e4e34b92/R/render.R#L493-L497 https://github.com/rstudio/rmarkdown/blob/72062edf39012ce2e08ff05ece100747e4e34b92/R/render.R#L504-L506

mikmart avatar Feb 19 '22 23:02 mikmart

Oh that sounds indeed much trickier than I thought... I didn't think much of the case in which inline R expressions are present in YAML.

In the past, we have received a lot of feature requests and bug reports on the arguments output_file, output_dir, intermediate_dir, etc. These are usually too complicated to deal with, and our common response has been like "first call render() to generate the default output file, and then rename/move it to the desired path". It's not easy to do everything inside render() alone.

In this case, I guess we may need to say the same. You can render(..., output_options = list(keep_md = TRUE), read YAML from the .md output file, determine the desired filename, rename the output file, and delete the .md file. Instead of writing a custom output format, I think it should be easier to write a function that does these steps. You have pretty much already done it in your original post. Now the question is where this function should live. For us, the easiest way is that you create your own package. However, as I said, this has been a common feature request, so I can also consider hosting this function just inside rmarkdown, and you might set the knit field to something like

knit: >
  rmarkdown::render_file(
    function(metadata) {
      with(metadata, paste(title, "by", author, "on", date))
    }
  )

or even shorter if we provide another wrapper function

knit: >
  rmarkdown::render_meta(paste(title, "by", author, "on", date))

Does that sound okay to you?

yihui avatar Feb 23 '22 16:02 yihui

Yeah, that sounds totally reasonable. I'm not sure I'd get around to making a package for this (at least picking a name for what would essentially be a single-function package would be a struggle), so it would be great if such a helper could live in rmarkdown.

Regarding the function:

I believe that the same issues w.r.t files_dir will still apply with the render-and-rename approach, but it should be easier to document and draw attention to them in a separate function.

I think I'd like to be quite verbose with the name of a wrapper function, as it would have quite a lot to do. Something like render_to_dynamic_file() or actually maybe render_and_rename(). I'd strongly support the name-specifying argument being a function rather than an expression evaluated in the metadata data mask. That would make it much easier to program with, with the expense of slightly more verbose code for simple use.

However, it feels like it would be neater if the helper function didn't have to wrap render(), but could instead operate on the render result, following more of a single responsibility principle. That would be possible if render() would e.g. (have an option to) attach the rendered front matter as an attribute on the return result, kind of like rmd_output_metadata currently.

Then you could have something very explicit like:

output_file <- rmarkdown::render(input_file)
rename_with_metadata(output_file, function(metadata) {
  with(metadata, paste(title, "by", author, "on", date))
})

With the helper function signature something like:

rename_with_metadata <- function(file, callback, metadata = attr(file, "rmd_rendered_front_matter")) { ... }

What do you think about this kind of approach?

mikmart avatar Feb 26 '22 12:02 mikmart

I actually ended up making a little PoC package that implements this approach:

# remotes::install_github("mikmart/rmdmatter")

output_file <- rmdmatter::render(input_file)
rmdmatter::rename_rendered(output_file, function(metadata) {
  with(metadata, paste(title, "by", author, "on", date))
})

Or with knit in front matter directly using a function that returns a rendering function:

---
title: "Untitled"
author: "Jane Doe"
date: "`r Sys.Date()`"
output: html_document
knit: |
  rmdmatter::renaming_renderer(function(metadata) {
    with(metadata, paste(title, "by", author, "on", date))
  })
---

204 No Content

I suspect it might be a little fragile at the moment due to relying on keep_md in output_options, but it seems to be working.

mikmart avatar Feb 26 '22 21:02 mikmart