pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

org-mode: internal links are interpreted & rendered as emphasis

Open tg-x opened this issue 4 years ago • 14 comments
trafficstars

internal links in org-mode input are interpreted incorrectly and rendered as emphasis in HTML & Markdown output.

example

tested using pandoc 2.10.1

org input

> cat test.org
* Some section

* section

* sectiox

[[#some-section]]

[[some section]]
[[Some section]]
[[*Some section]]

[[some section][some section]]
[[Some section][Some section]]
[[*Some section][Some section]]

md output

> pandoc --to markdown test.org
Some section
============

section {#section-1}
=======

sectiox
=======

[\#some-section](#some-section) [some section](#some-section)

*some section* *Some section* *\*Some section*

*some section* *Some section* *Some section*

html output

> pandoc --to html test.org
<h1 id="some-section">Some section</h1>
<h1 id="section-1">section</h1>
<h1 id="sectiox">sectiox</h1>
<p><a href="#some-section">#some-section</a> <a href="#some-section">some section</a></p>
<p><em>some section</em> <em>Some section</em> <em>*Some section</em></p>
<p><em>some section</em> <em>Some section</em> <em>Some section</em></p>

expected

All of the above should be interpreted & rendered as links instead of emphasis.

On a related note, I found a minor issue while testing: a section titled ''Section" gets the ID #section-1 instead of #section

tg-x avatar Dec 03 '20 16:12 tg-x

related: #6917

tg-x avatar Dec 03 '20 17:12 tg-x

a section titled ''Section" gets the ID #section-1 instead of #section

This may be because you have another section with the same name? Pandoc ensures that the ids are unique.

jgm avatar Dec 03 '20 17:12 jgm

a section titled ''Section" gets the ID #section-1 instead of #section

This may be because you have another section with the same name? Pandoc ensures that the ids are unique.

no, this already happens with a single-line file containing only that section, but doesn't happen with other section titles:

> cat test.org
* section
* sectiox
> pandoc --to html5 test.org
<h1 id="section-1">section</h1>
<h1 id="sectiox">sectiox</h1>
> cat test.md
# section
# sectiox
> pandoc --to html5 test.md
<h1 id="section">section</h1>
<h1 id="sectiox">sectiox</h1>

tg-x avatar Dec 03 '20 17:12 tg-x

We must have submitted our issues within moments of one another. lol Anyway, I closed my (very closely related) issue #6917 as you seem to have beaten me to the submit button by a nose. :smile:

In my case, in page link:

[[*Setup][Setup]]

...is not being rendering to a link at all. Instead it gets converted to italicised text:

*Setup*

But that's just a different way of re-stating what you already wrote above. I am pretty sure this is all the same issue. As discussed in my post to the mailing list, @tarleb seem to have already confirmed this is indeed an issue with the Org reader.

TRSx80 avatar Dec 03 '20 17:12 TRSx80

yes it's the same issue with markdown and also with links with description, and thus likely the issue is with the reader indeed I updated the issue title & description accordingly

tg-x avatar Dec 03 '20 18:12 tg-x

I noticed the recent commit (referenced immediately above this post) which seems to address this issue. Thanks a lot @tarleb! :+1:

So I compile latest from sources to test. Trying again to issue the command:

pandoc -f org -t commonmark README.org

...which is same exact file (and example) as mentioned further up thread, the relevant output is now:

<span class="spurious-link" target="*Setup">*Setup*</span>

Which I suppose addresses at least part of the problem. However, I am guessing that something else needs to be done "on the other (Markdown) side" to arrive at the output I am expecting (i.e., a link which will be formatted in Markdown)?

Maybe that is even, strictly speaking, outside the scope of this particular issue.

tarleb mentions in the commit notes:

This allows to recover and fix broken or unknown links with filters.

However being fairly new to pandoc, I must confess to being a bit lost as to what this bit about "filters" means. If I just need to go RTFM I will be happy do so; however is there some other additional functionality which still needs to be implemented before this will work as I am hoping for?

TRSx80 avatar Dec 09 '20 23:12 TRSx80

The remaining step is to teach the org reader to recognize the link type and properly link to the header. The nice thing about the small change that I made is that it allows to build workarounds with filters. Below is a hacky Lua filter which finds and fixes links of type [[*Some section][Some section]]. But we'll still need to do it cleanly and integrate it into the Org reader.

local stringify = pandoc.utils.stringify
local headers = {}

function collect (header)
  headers[stringify(header)] = header.identifier
end

function fix_spurious_link (span)
  if span.classes:includes 'spurious-link' then
    local content = span.content[1].content
    local target = span.attributes.target
    local header_target = headers[target:sub(2)]
    if header_target then
      return pandoc.Link(content, '#' .. header_target)
    end
  end
end

return {
  {Header = collect},
  {Span = fix_spurious_link}
}

tarleb avatar Dec 10 '20 10:12 tarleb

Did this ever get added?

IllustratedMan-code avatar Jul 08 '22 13:07 IllustratedMan-code

I wasn't able to use this script without modification. I removed the :sub(2) and it worked.

IllustratedMan-code avatar Jul 08 '22 13:07 IllustratedMan-code

This has not been fixed. Thanks to @tarleb , that filter does work.

rickswe avatar Aug 18 '23 16:08 rickswe

Could somebody briefly summarize what work still needs to be done on this issue? (With examples and references to org documentation)

jgm avatar Aug 18 '23 16:08 jgm