rmarkdown
rmarkdown copied to clipboard
Allow `output_file` to be a function in `render()` [FR]
It would be useful to have an easy way to reuse metadata included in the YAML front matter of an RMarkdown file for generating output file names when rendering. This originally came up on StackOverflow.
At the moment it's simple enough to do with yaml_front_matter()
if the metadata doesn't include R expressions. Just extract the metadata from the RMarkdown file, and build the output file name.
However, it becomes quite a bit more complicated if the metadata fields contain expressions that need to be rendered first. One way to do that is to keep the intermediate Markdown file, and extract the metadata from there. For example:
---
title: "Untitled"
author: "Jane Doe"
date: "`r Sys.Date()`"
output: word_document
knit: >
(function(input_file, encoding) {
# Render, keeping intermediate files for extracting front matter
md_dir <- tempdir()
output_file_temp <- rmarkdown::render(
input = input_file,
output_file = tempfile(),
intermediates_dir = md_dir,
clean = FALSE
)
# Get the rendered front matter from the intermediate Markdown file
md_file <- fs::path_ext_set(fs::path_file(input_file), ".knit.md")
metadata <- rmarkdown::yaml_front_matter(fs::path(md_dir, md_file))
# Build output file name based on rendered metadata
output_name <- with(metadata, paste(title, "by", author, "on", date))
output_ext <- fs::path_ext(output_file_temp)
output_file <- fs::path_ext_set(output_name, output_ext)
fs::file_move(output_file_temp, output_file)
})
---
204 No Content
Would it be possible to allow the output_file
argument to render()
to be a function that gets the rendered metadata as an argument? The above could then become:
---
title: "Untitled"
author: "Jane Doe"
date: "`r Sys.Date()`"
output: word_document
knit: >
(function(input_file, encoding) {
rmarkdown::render(
input = input_file,
output_file = function(metadata) {
with(metadata, paste(title, "by", author, "on", date))
}
)
})
---
204 No Content
And the resulting output file would be named Untitled by Jane Doe on 2022-02-18.docx
.
I would suggest define custom format and tweak post_processor
with rmarkdown::output_format
.
In this way, you do not have to write knit
in YAML front matter.
Oh thanks for pointing that out! post_processor
seems like exactly the kind of thing to solve this.
Needing to define a whole new output format seems like a bit much though, considering that this has nothing to do with the actual file format. You'd basically need to write a wrapper format that exposes the post_processor
argument for each actual output format that you want to use (docx, pdf, html, etc.).
Maybe there'd be a way to add a function that takes an existing format and adds this functionality...
That sounds like a reasonable feature request, and is indeed a common request, too. I agree that defining a new output format is too much only for this purpose.
I feel this should not be hard to implement. If anyone wants to submit a pull request, please feel free to. Thanks!
Thanks for considering this @yihui! I'd be happy to give a go at making a PR. Hopefully sometime this weekend.
I've been taking the lay of the land for this, and I've encoutered an issue I'm not sure how to deal with.
In order for output_file
to be a function that receives the rendered metadata, we'd need to avoid materializing it until after knitting has been done. For the most part, this isn't an issue. But there are two problematic uses pre-knitting:
- If
output_file
is a path that includes directories, andoutput_dir
hasn't been specified, the directories inoutput_file
are used to determine aoutput_dir
going forward. - Both
output_dir
and thebasename()
of theoutput_file
are required to construct thefiles_dir
used for saving images during knitting. If I understand correctly,files_dir
needs to be known before knitting, and must remain unchanged after it, since the knitted document can include links that point to it.
I could see the first point being circumvented by e.g. using a temporary output_dir
and moving to the final one once the dynamic output file name is known. However, for the second issue with files_dir
I can't really see a neat solution.
Some options that I considered and dismissed:
- Use
input
to namefiles_dir
? Would clash if many outputs are rendered from the same input, e.g. parameterized report. - Use a random identifier for
files_dir
? Would cause an explosion of directories for re-renders. - Use some appropriate hash for
files_dir
? Would be difficult for a user to determine relation between outputs and dirs. - Use un-rendered metadata to create a
output_file
used to namefiles_dir
? Most promising, but would at least need to sanitize R expressions to be suitable for use in directory names.
@yihui do you have any thoughts on how we might proceed, in particular with the files_dir
issue?
Some relevant code links: https://github.com/rstudio/rmarkdown/blob/72062edf39012ce2e08ff05ece100747e4e34b92/R/render.R#L493-L497 https://github.com/rstudio/rmarkdown/blob/72062edf39012ce2e08ff05ece100747e4e34b92/R/render.R#L504-L506
Oh that sounds indeed much trickier than I thought... I didn't think much of the case in which inline R expressions are present in YAML.
In the past, we have received a lot of feature requests and bug reports on the arguments output_file
, output_dir
, intermediate_dir
, etc. These are usually too complicated to deal with, and our common response has been like "first call render()
to generate the default output file, and then rename/move it to the desired path". It's not easy to do everything inside render()
alone.
In this case, I guess we may need to say the same. You can render(..., output_options = list(keep_md = TRUE)
, read YAML from the .md
output file, determine the desired filename, rename the output file, and delete the .md
file. Instead of writing a custom output format, I think it should be easier to write a function that does these steps. You have pretty much already done it in your original post. Now the question is where this function should live. For us, the easiest way is that you create your own package. However, as I said, this has been a common feature request, so I can also consider hosting this function just inside rmarkdown, and you might set the knit
field to something like
knit: >
rmarkdown::render_file(
function(metadata) {
with(metadata, paste(title, "by", author, "on", date))
}
)
or even shorter if we provide another wrapper function
knit: >
rmarkdown::render_meta(paste(title, "by", author, "on", date))
Does that sound okay to you?
Yeah, that sounds totally reasonable. I'm not sure I'd get around to making a package for this (at least picking a name for what would essentially be a single-function package would be a struggle), so it would be great if such a helper could live in rmarkdown.
Regarding the function:
I believe that the same issues w.r.t files_dir
will still apply with the render-and-rename approach, but it should be easier to document and draw attention to them in a separate function.
I think I'd like to be quite verbose with the name of a wrapper function, as it would have quite a lot to do. Something like render_to_dynamic_file()
or actually maybe render_and_rename()
. I'd strongly support the name-specifying argument being a function rather than an expression evaluated in the metadata data mask. That would make it much easier to program with, with the expense of slightly more verbose code for simple use.
However, it feels like it would be neater if the helper function didn't have to wrap render()
, but could instead operate on the render result, following more of a single responsibility principle. That would be possible if render()
would e.g. (have an option to) attach the rendered front matter as an attribute on the return result, kind of like rmd_output_metadata
currently.
Then you could have something very explicit like:
output_file <- rmarkdown::render(input_file)
rename_with_metadata(output_file, function(metadata) {
with(metadata, paste(title, "by", author, "on", date))
})
With the helper function signature something like:
rename_with_metadata <- function(file, callback, metadata = attr(file, "rmd_rendered_front_matter")) { ... }
What do you think about this kind of approach?
I actually ended up making a little PoC package that implements this approach:
# remotes::install_github("mikmart/rmdmatter")
output_file <- rmdmatter::render(input_file)
rmdmatter::rename_rendered(output_file, function(metadata) {
with(metadata, paste(title, "by", author, "on", date))
})
Or with knit
in front matter directly using a function that returns a rendering function:
---
title: "Untitled"
author: "Jane Doe"
date: "`r Sys.Date()`"
output: html_document
knit: |
rmdmatter::renaming_renderer(function(metadata) {
with(metadata, paste(title, "by", author, "on", date))
})
---
204 No Content
I suspect it might be a little fragile at the moment due to relying on keep_md
in output_options
, but it seems to be working.