pandoc
pandoc copied to clipboard
org-mode: internal links are interpreted & rendered as emphasis
internal links in org-mode input are interpreted incorrectly and rendered as emphasis in HTML & Markdown output.
example
tested using pandoc 2.10.1
org input
> cat test.org
* Some section
* section
* sectiox
[[#some-section]]
[[some section]]
[[Some section]]
[[*Some section]]
[[some section][some section]]
[[Some section][Some section]]
[[*Some section][Some section]]
md output
> pandoc --to markdown test.org
Some section
============
section {#section-1}
=======
sectiox
=======
[\#some-section](#some-section) [some section](#some-section)
*some section* *Some section* *\*Some section*
*some section* *Some section* *Some section*
html output
> pandoc --to html test.org
<h1 id="some-section">Some section</h1>
<h1 id="section-1">section</h1>
<h1 id="sectiox">sectiox</h1>
<p><a href="#some-section">#some-section</a> <a href="#some-section">some section</a></p>
<p><em>some section</em> <em>Some section</em> <em>*Some section</em></p>
<p><em>some section</em> <em>Some section</em> <em>Some section</em></p>
expected
All of the above should be interpreted & rendered as links instead of emphasis.
On a related note, I found a minor issue while testing: a section titled ''Section" gets the ID #section-1 instead of #section
related: #6917
a section titled ''Section" gets the ID #section-1 instead of #section
This may be because you have another section with the same name? Pandoc ensures that the ids are unique.
a section titled ''Section" gets the ID #section-1 instead of #section
This may be because you have another section with the same name? Pandoc ensures that the ids are unique.
no, this already happens with a single-line file containing only that section, but doesn't happen with other section titles:
> cat test.org
* section
* sectiox
> pandoc --to html5 test.org
<h1 id="section-1">section</h1>
<h1 id="sectiox">sectiox</h1>
> cat test.md
# section
# sectiox
> pandoc --to html5 test.md
<h1 id="section">section</h1>
<h1 id="sectiox">sectiox</h1>
We must have submitted our issues within moments of one another. lol Anyway, I closed my (very closely related) issue #6917 as you seem to have beaten me to the submit button by a nose. :smile:
In my case, in page link:
[[*Setup][Setup]]
...is not being rendering to a link at all. Instead it gets converted to italicised text:
*Setup*
But that's just a different way of re-stating what you already wrote above. I am pretty sure this is all the same issue. As discussed in my post to the mailing list, @tarleb seem to have already confirmed this is indeed an issue with the Org reader.
yes it's the same issue with markdown and also with links with description, and thus likely the issue is with the reader indeed I updated the issue title & description accordingly
I noticed the recent commit (referenced immediately above this post) which seems to address this issue. Thanks a lot @tarleb! :+1:
So I compile latest from sources to test. Trying again to issue the command:
pandoc -f org -t commonmark README.org
...which is same exact file (and example) as mentioned further up thread, the relevant output is now:
<span class="spurious-link" target="*Setup">*Setup*</span>
Which I suppose addresses at least part of the problem. However, I am guessing that something else needs to be done "on the other (Markdown) side" to arrive at the output I am expecting (i.e., a link which will be formatted in Markdown)?
Maybe that is even, strictly speaking, outside the scope of this particular issue.
tarleb mentions in the commit notes:
This allows to recover and fix broken or unknown links with filters.
However being fairly new to pandoc, I must confess to being a bit lost as to what this bit about "filters" means. If I just need to go RTFM I will be happy do so; however is there some other additional functionality which still needs to be implemented before this will work as I am hoping for?
The remaining step is to teach the org reader to recognize the link type and properly link to the header. The nice thing about the small change that I made is that it allows to build workarounds with filters. Below is a hacky Lua filter which finds and fixes links of type [[*Some section][Some section]]. But we'll still need to do it cleanly and integrate it into the Org reader.
local stringify = pandoc.utils.stringify
local headers = {}
function collect (header)
headers[stringify(header)] = header.identifier
end
function fix_spurious_link (span)
if span.classes:includes 'spurious-link' then
local content = span.content[1].content
local target = span.attributes.target
local header_target = headers[target:sub(2)]
if header_target then
return pandoc.Link(content, '#' .. header_target)
end
end
end
return {
{Header = collect},
{Span = fix_spurious_link}
}
Did this ever get added?
I wasn't able to use this script without modification. I removed the :sub(2) and it worked.
This has not been fixed. Thanks to @tarleb , that filter does work.
Could somebody briefly summarize what work still needs to be done on this issue? (With examples and references to org documentation)