rmarkdown icon indicating copy to clipboard operation
rmarkdown copied to clipboard

Has rmarkdown rendering of images included via markdown with alt text changed? How to avoid md markup conversion to html?

Open gavinsimpson opened this issue 11 months ago • 2 comments

A couple of days ago I noticed that my README.md that is automatically built from the README.Rmd in my repo via GH Actions had changed:

https://github.com/gavinsimpson/gratia/commit/be93b193b8092517d9d75d81e2b565da801efb46

In place of the markdown syntax for images being preserved from the .Rmd source, these are being replaced by HTML with a figure caption. These caption are now being rendered (obviously) inside GitHub and it's annoying.

I had a quick look at Pandoc's recent release notes and didn't see anything related.

I looked at rmarkdown's NEWS.md and saw that there had been a recent release but nothing there seems related to this.

I can't reproduce this behaviour locally but my machine is running an older but supported LTS of Ubuntu and I have an old version of Pandoc (apt reports Version: 2.5-3build2). The version of Pandoc used on GH Actions (on ubuntu-latest) seems to have changed recently: it now reports 3.1.11.

Here's a recent GH Action run: https://github.com/gavinsimpson/gratia/actions/runs/8172627146/job/22343269488#step:7:34

If this is due to a change in Pandoc, are there ways to run the renderer to turn off this feature?

gavinsimpson avatar Mar 06 '24 13:03 gavinsimpson

Here's a recent GH Action run: https://github.com/gavinsimpson/gratia/actions/runs/8172627146/job/22343269488#step:7:34

The exact GHA run that committed the changes is this one: https://github.com/gavinsimpson/gratia/actions/runs/8142489615

Looking at the session info of this run and the previous run, I noticed the Pandoc version had changed from 2.19.2 to 3.1.11. Then I went to r-lib/actions and found they changed the default Pandoc version recently: https://github.com/r-lib/actions/commit/415fb3e4a23a024626cac05cdda8900398dbf688

So this change doesn't have anything to do with the rmarkdown package. If you want the old behavior, I think at least you can stay with Pandoc 2.19.2: https://github.com/r-lib/actions/tree/v2-branch/setup-pandoc#usage There might exist a Pandoc option to control the behavior of Markdown output. I'm not sure.

yihui avatar Mar 06 '24 15:03 yihui

Pandoc 3.0 is the one breaking

library(pandoc)

releases <- pandoc::pandoc_available_releases()
#> ℹ Fetching Pandoc releases info from github...
range <- releases[seq.int(which(releases == "2.19.2"), which(releases == "3.1.11"))]

# install all versions to tests
purrr::walk(range, \(x) suppressMessages(pandoc_install(version = x)))

breaking <- NULL
for (ver in range) {
  res <- pandoc_convert(text = "![Estimated smooths from a GAM](man/figures/README-draw-gam-figure-1.png)", to = "gfm", version = ver)
  if (grepl("figure", res[1])) {
    breaking <- ver
    break
  }
}
breaking
#> [1] "3.0"

See the Figure change

Markdown writer: figures are output as implicit figures if possible, via HTML if the raw_html extension is enabled, and as Div elements otherwise.

Opting out with -raw_html works

pandoc::pandoc_convert(text = "![Estimated smooths from a GAM](man/figures/README-draw-gam-figure-1.png)", to = "gfm-raw_html", version = "3.1.11")
#> ![Estimated smooths from a
#> GAM](man/figures/README-draw-gam-figure-1.png)
#> 
#> Estimated smooths from a GAM

To set that with rmarkdown you need to either

  • Use md_document() so that you can tweak the variant. github_document() does not allow that (we could adapt to it though by breaking new feature like variant_extension argument - cc @yihui )

  • Keep a pre-3.0 pandoc version to use with rmarkdown::render(): https://bookdown.org/yihui/rmarkdown-cookbook/install-pandoc.html

However, I don't think this is necessary, and desired. Let me explain.

Using the HTML version of figures should work ok. For the full context, the output you get is what the Markdown syntax used is supposed to output. The new Figure addition allows to improve the output by really making a figurèa figure in the output.

In place of the markdown syntax for images being preserved from the .Rmd source, these are being replaced by HTML with a figure caption. These caption are now being rendered (obviously) inside GitHub and it's annoying.

The markdown syntax used

![Estimated smooths from a GAM](man/figures/README-draw-gam-figure-1.png)

is setting a caption. See the syntax description: https://pandoc.org/MANUAL.html#extension-implicit_figures

An image with nonempty alt text, occurring by itself in a paragraph, will be rendered as a figure with a caption. The image’s alt text will be used as the caption.

If you don't want to set a caption, you need to not provide anything inside the caption part ![](man/figures/README-draw-gam-figure-1.png)

If you don't want a figure you need to follow the advice there,

If you just want a regular inline image, just make sure it is not the only thing in the paragraph. One way to do this is to insert a nonbreaking space after the image:

![This image won't be a figure](/url/of/image.png)\

pandoc::pandoc_convert(text = r"(![Estimated smooths from a GAM](man/figures/README-draw-gam-figure-1.png)\)", to = "gfm", version = "3.1.11")
#> ![Estimated smooths from a
#> GAM](man/figures/README-draw-gam-figure-1.png)

(Note: r"()" syntax is raw string for escaping or character (https://r4ds.hadley.nz/strings.html#sec-raw-strings))

You can also opt-out implicit figures

pandoc::pandoc_convert(text = "![Estimated smooths from a GAM](man/figures/README-draw-gam-figure-1.png)", to = "gfm", from = "markdown-implicit_figures", version = "3.1.11")
#> ![Estimated smooths from a
#> GAM](man/figures/README-draw-gam-figure-1.png)

In you example, I don't know what Estimated smooths from a GAM is supposed to mean. If this is alt text only, you can also use knitr for that.

```{r}
#| fig.alt: Estimated smooths from a GAM
knitr::include_graphics("man/figures/README-draw-gam-figure-1.png", error = FALSE)
```

will output

<img src="man/figures/README-draw-gam-figure-1.png" alt="Estimated smooths from a GAM"  />

in the intermediate file and output.

Hope this clarifies and help

cderv avatar Mar 07 '24 10:03 cderv

Thanks for this @yihui and especially @cderv for the in-depth analysis and explanation - very helpful indeed!

Pandoc's use of alt text as a caption goes against the definition of alt text in a HTML context. It's supposed to describe what the figure shows if the figure can't be shown for some reason or it can't be viewed (e.g. via a screen reader). Leaving aside the fact that my alt text wasn't a very good description of the figure (I guess I wrote it as a placeholder for something and never went back to provide something more useful), a good alt text would normally not be a good caption for a figure. This change is even more egregious though as markdown already allows a title in the image markup:

![The San Juan Mountains are beautiful!](/assets/images/san-juan-mountains.jpg "San Juan Mountains")

which could have been used for a caption (although I understand this wouldn't allow any markup in as the title is plain text - which is why the Pandoc devs went with the alt-text-as-caption option in the first place.)

None of this is rmarkdown's problem though; I should probably go vent (i.e. comment politely) at the pandoc devs, after deleting my alt texts to fix my original problem.

gavinsimpson avatar May 15 '24 12:05 gavinsimpson

I completely agree with you that alt text should not be used as the caption. In a new R Markdown package that I've been developing recently, I've made it clear that fig.cap is for caption, and fig.alt is for alt text; fig.alt will never be used as fig.cap (but fig.cap could be used as fig.alt if the latter is not provided).

yihui avatar May 15 '24 13:05 yihui