pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Missing support for subscripts and superscripts in GitHub-flavored Markdown (GFM)

Open thilomaurer opened this issue 1 year ago • 1 comments

Converting subscripts and superscripts from GFM to any fails with the exception of HTML as target. Is seems the tags <sub> and <sup> are not properly parsed into the AST, as can be seen below for the native target.

GFM input gfm.md

foo<sub>14:13</sub>bar

Exsamples:

  • LaTeX (bad result)
    $ podman run --rm --volume "$(pwd):/data:Z" pandoc/core -f gfm gfm.md -t latex
    foo14:13bar
    
  • pandoc-Markdown (bad result)
    $ podman run --rm --volume "$(pwd):/data:Z" pandoc/core -f gfm gfm.md -t Markdown
    foo`<sub>`{=html}14:13`</sub>`{=html}bar
    
  • HTML (good result)
    $ podman run --rm --volume "$(pwd):/data:Z" pandoc/core -f gfm gfm.md -t html
    <p>foo<sub>14:13</sub>bar</p>
    
  • native (bad result)
    $ podman run --rm --volume "$(pwd):/data:Z" pandoc/core -f gfm gfm.md -t native
    [ Para
     [ Str "foo"
     , RawInline (Format "html") "<sub>"
     , Str "14:13"
     , RawInline (Format "html") "</sub>"
     , Str "bar"
     ]
    ]
    
    

Pandoc version

podman run --rm --volume "$(pwd):/data:Z" pandoc/core --version
pandoc 3.1.1
[...]

thilomaurer avatar Sep 11 '23 07:09 thilomaurer

Raw HTML is always parsed as a RawInline or RawBlock element in the AST in pandoc's markdown variants.

I suppose we could look at an exception for gfm, which lacks any native syntax for sub/superscript. Because this is based on commonmark, it would require a supplementary filter through inline lists.

Indeed, you could implement this with a Lua filter.

function Inlines(ils)
  local result = {}
  local openers = {}
  for _,il in ipairs(ils) do
      local html = il.t == "RawInline" and il.format == "html"
      local new = nil
      if html and (il.text == "<sup>" or il.text == "<sub>") then
      	 table.insert(openers,{il,{}})
      elseif html and il.text == "</sup>" and openers[#openers] and openers[#openers][1].text == "<sup>" then
        local contents = table.remove(openers)
        new = pandoc.Superscript(contents[2])
      elseif html and il.text == "</sub>" and openers[#openers] and openers[#openers][1].text == "<sub>" then
        local contents = table.remove(openers)
        new = pandoc.Subscript(contents[2])
      else
        new = il
      end
      if new then
        if #openers > 0 then
          table.insert(openers[#openers][2], new)
        else
          table.insert(result, new)
	end
      end
  end
  while #openers > 0 do
  	local contents = table.remove(openers)
	table.insert(result, contents[1])
  	for _,il in ipairs(contents[2]) do
	  table.insert(result, il)
	end
  end
  return result
end

jgm avatar Sep 11 '23 15:09 jgm