pandoc
pandoc copied to clipboard
when `--toc`, GFM now outputs raw HTML instead or Markdown syntax for TOC
This is a follow up of https://groups.google.com/g/pandoc-discuss/c/gQZrKunCvB4/m/j7177M-3BwAJ which remained unanswered.
With recent Pandoc (here 2.18), we have this type of outputs
❯ ./pandoc --to gfm -f markdown --toc -s
# Head
Content
^Z
- <a href="#head" id="toc-head">Head</a>
# Head
Content
I believe this is a consequence to this change #7907. By adding ids on all TOC elements, it triggers raw HTML as output to keep the id in markdown, if raw_html is possible.
Is this expected that now activating TOC for markdown output will always return HTML except if raw_html extensions is deactivated ?
❯ pandoc --to gfm-raw_html -f markdown --toc -s
# HEAD
Content
^Z
- [HEAD](#head)
# HEAD
Content
I am not sure adding ID on TOC link by default is something expected for Markdown output. For HTML, I guess it does not hurt. Adding ID on TOC could be not set by default for markdown output maybe, or be part of an extension that could be deactivated ?
We adapted to this change of having raw HTML for TOC now, but I wanted to bring this change to discussion and confirm it is expected and not an unknown side effect
Thanks.
It's a consequence of #7907. I can see why this would be undesirable for markdown output if the markdown flavor doesn't allow you to encode the ID attribute.
I can think of a few potential solutions:
- If the markdown flavor doesn't support attributes, then omit the id attribute on the links produced in the TOC.
- Modify the writer for Link so that, if the link contains attributes, instead of falling back to HTML for the whole link, we wrap the link in raw HTML tags
<span ...attributes>..</span>. This would still be a bit ugly in gfm output, but less ugly.
1 seems simplest, but it's hard to predict whether people are already relying on these anchors for their gfm output.
1 is indeed the simplest and what I would have expected probably when I asked on Pandoc-discuss. I should have maybe open an issue here as I discovered when it was only in Nightly at the time. I understand that 2 could be more desirable now in case people are already using it. Sorry for that.
Regarding 2, do you mean like it was before with --toc --number-sections
❯ ./pandoc --to gfm -f markdown --toc -s --number-sections
# Head
Content
^Z
- [<span class="toc-section-number">1</span> Head](#head)
# Head
Content
Which is now with Pandoc 2.18 following the ID change:
❯ pandoc --to gfm -f markdown --toc -s -N
# Head
Content
^Z
- <a href="#head" id="toc-head"><span class="toc-section-number">1</span>
Head</a>
# Head
Content
If so, this will be two spans following in this case
- [<span id="toc-head"><span class="toc-section-number">1</span> Head</span>](#head)`
Unless we can merge into one and put the ID toc-head on the number span
or that --number-sections should not add any numbers as the headers in the body won't be numbered - only the TOC element will (--number-sections is not really working for GFM I believe).
I'd strongly prefer the first option; the switch to a new epoch version seems like a good opportunity to introduce some minor breakage. Besides, I'd be quite surprised to learn that people rely on this when targeting gfm.
Yes, the first option seems best to me too.
If #8485 gets merged in its current form then the old behavior could be restored with a custom writer:
Template = pandoc.template.default 'markdown'
function Writer (doc, opts)
local toc = pandoc.structure.table_of_contents(doc)
opts.variables['table-of-contents'] =
pandoc.write(pandoc.Pandoc{toc}, 'gfm')
return pandoc.write(doc, 'gfm', opts)
end