pandoc when `--toc`, GFM now outputs raw HTML instead or Markdown syntax for TOC

This is a follow up of https://groups.google.com/g/pandoc-discuss/c/gQZrKunCvB4/m/j7177M-3BwAJ which remained unanswered.

With recent Pandoc (here 2.18), we have this type of outputs

❯ ./pandoc --to gfm -f markdown --toc -s
# Head

Content
^Z
- <a href="#head" id="toc-head">Head</a>

# Head

 Content

I believe this is a consequence to this change #7907. By adding ids on all TOC elements, it triggers raw HTML as output to keep the id in markdown, if raw_html is possible.

Is this expected that now activating TOC for markdown output will always return HTML except if raw_html extensions is deactivated ?

❯ pandoc --to gfm-raw_html -f markdown --toc -s
# HEAD

Content
^Z
-   [HEAD](#head)

# HEAD

Content

I am not sure adding ID on TOC link by default is something expected for Markdown output. For HTML, I guess it does not hurt. Adding ID on TOC could be not set by default for markdown output maybe, or be part of an extension that could be deactivated ?

We adapted to this change of having raw HTML for TOC now, but I wanted to bring this change to discussion and confirm it is expected and not an unknown side effect

Thanks.

Jun 17 '22 06:06 cderv

It's a consequence of #7907. I can see why this would be undesirable for markdown output if the markdown flavor doesn't allow you to encode the ID attribute.

I can think of a few potential solutions:

If the markdown flavor doesn't support attributes, then omit the id attribute on the links produced in the TOC.
Modify the writer for Link so that, if the link contains attributes, instead of falling back to HTML for the whole link, we wrap the link in raw HTML tags <span ...attributes> .. </span>. This would still be a bit ugly in gfm output, but less ugly.

1 seems simplest, but it's hard to predict whether people are already relying on these anchors for their gfm output.

Jun 17 '22 15:06 jgm

1 is indeed the simplest and what I would have expected probably when I asked on Pandoc-discuss. I should have maybe open an issue here as I discovered when it was only in Nightly at the time. I understand that 2 could be more desirable now in case people are already using it. Sorry for that.

Regarding 2, do you mean like it was before with --toc --number-sections

❯ ./pandoc --to gfm -f markdown --toc -s --number-sections
# Head

Content
^Z
-   [<span class="toc-section-number">1</span> Head](#head)

# Head

Content

Which is now with Pandoc 2.18 following the ID change:

❯ pandoc --to gfm -f markdown --toc -s -N
# Head

Content
^Z
-   <a href="#head" id="toc-head"><span class="toc-section-number">1</span>
    Head</a>

# Head

Content

If so, this will be two spans following in this case

-  [<span id="toc-head"><span class="toc-section-number">1</span> Head</span>](#head)`

Unless we can merge into one and put the ID toc-head on the number span

or that --number-sections should not add any numbers as the headers in the body won't be numbered - only the TOC element will (--number-sections is not really working for GFM I believe).

Jun 17 '22 16:06 cderv

I'd strongly prefer the first option; the switch to a new epoch version seems like a good opportunity to introduce some minor breakage. Besides, I'd be quite surprised to learn that people rely on this when targeting gfm.

Dec 18 '22 10:12 tarleb

Yes, the first option seems best to me too.

Dec 18 '22 15:12 jgm

If #8485 gets merged in its current form then the old behavior could be restored with a custom writer:

Template = pandoc.template.default 'markdown'

function Writer (doc, opts)
  local toc = pandoc.structure.table_of_contents(doc)
  opts.variables['table-of-contents'] =
    pandoc.write(pandoc.Pandoc{toc}, 'gfm')
  return pandoc.write(doc, 'gfm', opts)
end

Dec 20 '22 17:12 tarleb

pandoc pandoc copied to clipboard

when `--toc`, GFM now outputs raw HTML instead or Markdown syntax for TOC

pandoc
pandoc copied to clipboard