pandoc
pandoc copied to clipboard
Markdown footnotes are duplicated
If you add multiple references to one footnote in Markdown like this:
This is a sentence [^fn] with footnote.
This is another sentence [^fn] with footnote.
[^fn]: This is the footnote.
pandoc will create multiple footnotes instead of referencing only one. For example, if converted into Markdown again, one would end up with:
This is a sentence[^1] with footnote.
This is another sentence [^2] with footnote.
[^1]: This is the footnote.
[^2]: This is the footnote.
The same happens when converted to HTML (I didn't try other formats). Obviously, this should not happen but instead it should be one footnote which is just linked to multiple points in the text.
I've tested this locally using version 1.12.3.3 as well as with 1.13.1 at Try pandoc!
I confirm the bug.
Just to compare: Another markdown processor, multimarkdown, when processing from md to html, generates many references to one footnote.
It would probably be better to present a uniform solution which works across all output formats.
I guess to have a uniform solution is never a bad idea, but I'm woundering which format supports footnotes, but not multiple references to one footnote?
I have no idea to be honest. That might be a good argument to just apply this change to the markdown writer.
Why the issue is marked as "enhancement"? I would say it is a bug.
+++ Václav Haisman [Jan 04 16 07:16 ]:
What about using the solution described in [1]http://tex.stackexchange.com/a/262956/28495?
If we did make a change allowing multiple references to a single footnote, I'd probably implement it in LaTeX without relying on an external package like fixfoot, to keep down dependencies.
But one thing to keep in mind is that pandoc supports many output formats, not just LaTeX. Any changes in how footnotes are handled require changes in all of them.
It seems this issue has a long history and discussion thread I am not aware of. Moreover, I am not experienced in this area, but…
mpickering wrote:
It would probably be better to present a uniform solution which works across all output formats.
I would not agree with this. Output formats differ, and uniform solution may not (or even will not) work for all.
For example: HTML output format assumes one (probably long) page. All the footnotes are printed at the very bottom. And having a separate footnote for every link looks ugly. (Backlinks is not a problem, at least conceptually, look at Mediawiki/Wikipedia solution: they have multiple backlinks in a single footnote.)
Another output format (TeX?) may assume output document consist of multiple pages, and footnotes are printed at the bottom of page. In such a case, it is natural to print footnote on each page containing a reference.
So, internal representation (I am not aware how you call it, probably AST?) should be able to represent multiple links to the same footnote. Each writer (html, tex, etc) should be able to handle it: either generate single footnote or multiple footnotes, depending on output format; no uniform solution is needed.
Any progress on this whatsoever?
No progress. It would be a fairly involved change, as noted above, affecting pandoc-types, all readers, and all writers. There are higher-priority things to work on at the moment.
Just ran into this issue. Would love to see pandoc handle footnotes like this correctly at some point.
FWIW, the same happens with reStructuredText footnotes.
Is there any plan when that should be fixed?
Just ran into this as well. Hoping for a fix, even though it sounds non-trivial.
I am really interested in this issue. I would like to use the same footnote in different places without having it duplicated.
@tarleb, I read your advice in #5196 but in my case I am not interested in use the footnotes with references inside, I just want to use as notes and have repeated the same text twice times is very annoying... Is any plan to fix it @jgm?
Of course, thanks in advance for your and time! Pandoc is an awesome and very useful tool :heart:
The problem is how Note is defined in the pandoc document AST. A note is simply an inline element, just like Strong or Strikethrough. Even though the note seems to have two distinct parts (the [^1] and the [^1]: text in the markdown, that's not how it's represented in the AST, there is only one part there (you can see this with -t native).
Thus the cleanest solution would be to change the AST, but that's something we try to do rarely, as it has the potential to break all pandoc filters etc. Btw. there is a proposal that would also change the Note element in the AST.
The alternative would be to somehow de-duplicate the Notes. But that seems somewhat hacky, as we only have the note-text to go by (the unique identifier is lost by then), but it's certainly something you could do in a pandoc filter.
@mb21 I am strongly in favour of changing AST and separating notes from note references.
Currently reader has to match note references to notes, so it has to postpone generation of AST until all notes are collected. It is also the reason for ugly F/Future in Markdown, Muse and other readers, and two-pass parsing in RST reader. Separating the notes from note references
Separating the notes from references will make it possible to shift the function of note reference lookup to writers, where all the notes are already collected.
It is also the reason for ugly
F/Futurein Markdown, Muse and other readers, and two-pass parsing in RST reader.
Not the only reason. There are also e.g. reference links. Changing this wouldn't remove the need for Future or two-pass.
Note also that the proposed AST change would require two passes (or something equivalent) in many of the writers. So I don't think there is any net decrease in complexity.
In addition to breaking filters etc., the change would be quite a lot of work, since every reader and writer would have to be modified. That's not to say we shouldn't do it. But I would prioritize better table and figure support over this.
Note that separating note references and notes in the AST would also allow a fix for #2053...though there may be some complications handling that in some output formats.
De-duplicating the notes is not a good solution, since people might want to have multiple notes with the same content (e.g., "Ibid., p. 33").
Might be an obvious quickfix, but for Latex a simple $^{<number>}$ can be used if the footnote number is known and will not change anymore.
Reading this issue discussion, kind of made me think that there is a general potential for a structural improvement regarding the AST that would allow for smoother changes. More specifically: Would it be possible to make the required change in the AST, but also have "a function" that then converts the AST into how it would have looked before, and then let the readers define which version of the AST they are/can supply, and let the writers define which version(s) of the AST they support/prefer? Or instead of linear versions only, it could even be version+extensions. Such a scenario would allow to change the AST, and then gradually implement support for new features in the readers and writers. sort of a parallelization of development efforts, if you will.
The problem is that the AST is not only exposed to the readers and writers, but also potentially to haskell programs using pandoc as a library (although a lot of them hopefully use Text.Pandoc.Builder), and worst of all, to pandoc filters. The latter could be fixed long-term to some degree by changing the JSON serialization format to be more human-readable, less Haskell-ADT-oriented, see this comment, although some breakage is probably unavoidable unless you go full-blown versioning and 100% backwards-compatibility which has a lot of mental and development overhead.
what if everything was versioned (including the things you mentioned like filters), and we do backwards compatibility where feasible, and where not implemented, we fail gracefully with a meaningful message like: "Error: Filter abc.py (supporting versions [0.3 - 0.4]) does not support any of the AST versions available ([0.6 - 0.9]) for the current input and output formats."?
The JSON output does include version information like "pandoc-api-version":[1,17,6]. The idea of providing compatibility layers is good, it could help to avoid a schism of the python 2/3 kind. We may have to take that step at some point, but updating everything will still be a huge amount of work for the reasons mentioned by @mb21.
yeah, one can not reduce the work with versioning, but the work that has to be done at once, so to speak... right? If we assume that everything that is unversioned supports only the current version (not current at the time, but current when the versioning is introduced), it should all be gradual, no? but yeah... lots of work.
Does this issue fall under the topic of cross references?
Any progress on this?
Is there a solution to this issue? I stumbled on the bug while writing an article and needed to reference the same footnote more than once.
I'm also running into this issue in a document I'm working on. Would love to see a solution.
same problem here, this bug is 10 years old, less than other 10 years and it becomes of age :)
There are two ways I can see to handle this. Both would require changes to the AST.
- Add a Block constructor
Note Identifier [Block]. Change the Inline constructor forNotetoNoteRef Identifier. - Keep
Noteas an Inline constructor, but add an identifier, e.g.Note Identifier [Block]or maybeNote (Maybe Identifier) [Block].
Idea 1 is a heavyweight change. It would require fairly extensive changes to all of the readers and writers.
Idea 2 is lighter-weight. It would still require changes throughout the ecosystem, but we could start with lightweight changes. For example, we could keep the existing note builder and have it simply initialize the identifier with an empty or Nothing value, so that existing readers that use note would not need changes to compile. And then writers could be revised to simply ignore the Identifier field -- this would allow the project to be compiled with its present behavior. Going forward, we could start supporting the identifier piecemeal -- e.g. adding it in the Markdown reader, and paying attention to it in writers. The idea would be that subsequent notes with the same identifier would be rendered as references to the same note (regardless of their [Block] content).
Idea 2 seems a practical way forward with this issue. Still, it must be said that any changes to the API will cause problems for a lot of users. There are 50+ people who have +1'd this issue, but there could be many thousands who could be affected negatively by the change (at least temporarily). @tarleb any thoughts?