pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Allow multiple note types (footnotes, endnotes, etc.)

Open adunning opened this issue 9 years ago • 22 comments

As discussed on the mailing list, it is possible in Word, Open/LibreOffice, LaTeX, and elsewhere to have both footnotes and endnotes in a single document, and publishers (examples from Oxford, Peeters) request that texts be submitted this way for various purposes (e.g. citations plus a glossary; separate textual and source notes in a scholarly edition). The final version might be rendered as footnotes plus endnotes (e.g. a commentary at the end of a text); multiple sets of footnotes or endnotes (often using some sort of differentiated marker system, e.g. letters in addition to numbers or a key to line numbers); or footnotes plus sidenotes.

There have also been requests in the past for sidenotes/margin notes. The consensus on the list seems to be that this could be useful, but isn't as necessary as simply having some ability to have two separate series of notes, especially given that the creation of this in other formats is less clear-cut outside LaTeX (marginpar), DocBook (sidebar; cf. the AsciiDoc implementation), and perhaps HTML (aside). It can be done only awkwardly in LibreOffice and Word, though I have heard of publishers using Word comments as text to be put in the margin. As another approach, Classical Text Editor allows as many as fifty separate series of notes (plus text to be placed in the inner and outer margins), each of which has separately-controlled rendering. This is surely too extreme for Pandoc (cf. the eledmac package in LaTeX), but it's an interesting approach.

It was suggested by @jkr that it could be structured like this:


data NoteType = Footnote | Endnote | Marginnote 

data Inline = ... 
            | Note NoteType [Block] 
            ... 

This is a summary of the list's proposals for the Markdown syntax, most of which centre around changing the character used for the footnote:

Written as footnotes

  • [^1]: footnote and [#1]: endnote and [>1]: sidenote
  • [^1]: footnote and [^^1]: endnote and [^^^1]: sidenote
  • [^1]: footnote and [^a]: endnote (i.e. if different identifiers are being consistently used)

Written as inline notes

  • ^[footnote] and ^^[endnote]
  • ^[^ footnote] (the extra caret being optional) and ^[# endnote] and ^[> sidenote] and ^[| inline note]

adunning avatar Jul 14 '14 20:07 adunning

  • I don't think the [^1] vs. [^a] version will work as the current syntax for footnotes is actually [^arbitrary_identifier] so different 'magic characters' instead of/in addition to ^ is the way to go.
  • One possible way of marking endnotes would be [~id] or [^~id] since Pandoc already uses ~sub~ vs. ^sup^.
  • NB that it's not trivial to use very many \marginpars (as e.g. with margin glosses) in LaTeX since there is a finite but somewhat adjustable amount of memory allocated to them. In practice I have to use one marginpar with linebreaks for each paragraph.

bpj avatar Jul 15 '14 10:07 bpj

A thought on [^1] vs [^a]: One thing it would allow would be to control whether footnotes, endnotes, or both were generated via markdown extensions. If the normal [^ is considered the base, then you can have a +endnotes option that turns [^a] into endnotes - without changing the file. Which means you get complete backwards compatibility.

DanStaal avatar Jul 15 '14 18:07 DanStaal

The problem with [^a] vs. [^1] is that it breaks backwards compatibility in a bad way. The possibility to use arbitrary note identifiers is a valuable feature: while working on a document you can use a descriptive id to easier find note references and definitions and even more importantly if you already have reformatted your source with pandoc and want to insert some more notes you don't need to check for the highest already used id number or assign some arbitrary astronomic number which anyway will be out of sequence, you just assign some arbitrary descriptive id and have it turned into an in-sequence number.

Also how is [^a] vs. [^1] to interact with --id-prefix, which is really indispensable?

That's why I think the best way to indicate endnotes and sidenotes is to add some punctuation after the caret. Since footnote ids can currently contain any characters other than spaces, tabs and newlines (does that mean they can contain carriagereturns? ;-) 'compatibility mode' would just treat that extra punctuation as part of the id.

I don't like [^^endnote] which was my first thought anymore, but [^#endnote] or [^~endnote] and the obvious [^>sidenote] would work and be backwards compatible. It would also be more economic with punctuation since [#foo] [~bar] [>baz] can be used for something else later (cf. [@cite] -- I would particularly like [#anchor]! :-).

Another matter is that automatic numbering of endnotes should probably use the a..z,aa,ab..zy,zz series.

OTOH I think that the best change in the long run would be to store the user-assigned note id in the AST. One could then use filters to convert notes into endnotes, sidenotes or foonotes according to ones need on the basis of the presence of some (sequences of) characters in the id. This is also a change which @jgm has already expressed that he might consider!

bpj avatar Jul 15 '14 19:07 bpj

Strawman idea: what if the location of the note body in the source affected its position in the output? Something like:

# Section

Text [^marginnote] text [^footnote].

[^marginnote]: written following the reference more or less immediately.

More text [^endnote].
(How would I distinguish margin vs foot if this paragraph wasn't here?)

----

[^footnote]: collected at end of section.  This is ambiguos, requiring them to be below a `----` might be a good solution.

# New Section
...

# Endnotes

[^endnote]: collected at end of document?  A more robust criterion is being in another section -- people might want stuff after endnotes, e.g. a bibliography...

There are obviously holes in this proposal, and compatibility problems - until now all of these were footnotes of equal standing. But it's such a natural direction...

cben avatar Jul 23 '14 17:07 cben

Seems fragile -- I know I would always be getting it wrong. Plus it would break concatting docs, which is pretty useful for multi-file documents.

My bigges interest is in having it in the AST -- I imagine you could filter to get there if you needed it, without disturbing most folks' use of just one note type. e.g.

Blah[^1]. Bleh[^2].


[^1]: default sort of note. Let's say footnote.
[^2]: <div class="endnote">I'm at the end.</div>

It's a pain to write. But the point is, it can be in the AST and useful for the small percentage who need it without needing to decide on markdown syntax for it. This was what happened with the generic divs and spans. The above also has the advantage of degrading gracefully.

That being said, if I had to vote for some sort of ascii markup, I would probably go for some replacement for the caret:

[^footnote]
[>marginnote]
[~endnote]

or the like.

jkr avatar Jul 23 '14 18:07 jkr

Or, yeah, the backwards-compatible version in @bpj 's version above.

jkr avatar Jul 23 '14 18:07 jkr

At least in the near term, at this point I'll agree the best is to get all of this info into the AST, so a filter can handle it. A fairly generic filter would then be a good thing for a 'pandoc extras' package.

There is the possibility of leaving the options in the filters - not everything has to be built into pandoc, if there is a good set of readily-available filters for 'common uncommon' things. It would keep pandoc simpler (trying to avoid the 'kitchen sink effect'), while still allowing the more powerful uses.

DanStaal avatar Jul 23 '14 23:07 DanStaal

The only downside I'd see to leaving it in the filters is that you couldn't simulate the different-carets, since the AST doesn't keep the footnote marker. But other than that, I'd tend to agree. The only user-facing thing that would need to come with a change to the AST would be some sort of --default-note-type=... option.

I also like the idea of a pandoc-extras/pandoc-contrib filters package. A place to set up niche features, or try out some features on their way to inclusion.

This all makes me realize that I'd really like to see a filter option in the YAML block. In some docs I have a hefty amount of filters, but don't really want to set up a Makefile, for whatever reason. It would be nice to have the doc keep track of the filters it wants.

jkr avatar Jul 24 '14 13:07 jkr

You could place a marker character as the first char of the footnote text, optionally followed by an all-alphanumeric id and then a space. Perhaps you could put it in braces so you won't need to backslash escape the marker. Not as elegant but the filter could see it, inspect it, and remove it from the note's content list before it possibly munges the note element. I've been thinking all day about whipping up a proof of concept, but I'm going away on Saturday and won't likely have time before then. Of course the real solution would be for all elements to take attributes -- namely as braces as the very first thing in the element, at the top of block elements, as the first thing after the delimiter in inline elements.

bpj avatar Jul 24 '14 15:07 bpj

Interestingly, it turns out that someone has already made their own fork of Pandoc to accomplish this: https://github.com/susanemcg/pandoc-tufteLaTeX2GitBook.

adunning avatar Jul 31 '14 06:07 adunning

I've also written some filters for this, in dfaligertwood/utility-scripts. They're not particularly well described, however (e.g. pandoc-musicnotes is actually second level textual annotations using edNotes.sty in LaTeX..)

I've used the first character of the footnote text to identify the type. This isn't really a brilliant method though, and my implementation is quite brittle. IMO the ideal method would be have either the footnote id or additional attributes available in the AST for filters.

dfaligertwood avatar Sep 16 '14 20:09 dfaligertwood

I think the best option would be to go ahead with the YAML method suggested by @jkr: https://github.com/jgm/pandoc/issues/1425#issuecomment-50009783

If this is within the YAML metadata, then this would allow for docx/odt -> markdown -> docx/odt conversion to work, otherwise, it would only be a one-way process.

the-solipsist avatar Aug 28 '15 18:08 the-solipsist

FWIW, this would be a great feature, and this proposal seems quite attractive to me:

  • [^1]: footnote and [#1]: endnote and [>1]: sidenote

charlesangus avatar Jul 18 '19 23:07 charlesangus

Wow @charlesangus comment jogged my memory!

In the years since 2014 I have written two filters in Perl which addressed this issue. The first used the idea of having a character — actually a character in a code span — at the beginning of the note signal the note type: `~` for endnote, `>` for sidenote — implemented with marginnote so the "arrow" may be followed by a value for the voffset parameter e.g. `>2cm` —, and `&prefix` for multiple apparatuses in the sense of manyfoot and bigfoot. I already used (and still use) a filter which uses `#id` to insert anchors, so I chose different magic characters. I later did rewrite of this filter which uses spans with classes and attributes, which has the upside that the code became simpler and the downside that the attributes aren't visible if you build the document without the filter during editing.

The other filter did part of the same thing but for HTML. It could theoretically place text in spans which could be turned into sidenotes or popups with CSS but my CSS skills weren't really good enough to make it work satisfactorily, at least not in all browsers. In this filter all types of notes had to have an id, which appeared as an actual part of the ids on the HTML elements which the filter created (and I changed the LaTeX filter so that it could either ignore those ids properly or use them properly): basically `~foo` became id="endnote-foo". "Endnotes" and "manynotes" could also be written to another Markdown or HTML file which you could link to (the filter built an AST and shelled out to Pandoc to write the file) and possibly display in an HTML frame. The difference between endnotes and "footnotes" in HTML was that the latter could be placed after the paragraph or after sections of a certain level (actually the notes were collected and the collection "flushed" either after paragraphs, before headings of a certain level (you could specify which in the metadata) or wherever you put a code block with the text ^^^Notes^^^). I experimented with numbering HTML notes per section, but since that required either guessing the section numbers or generating them before running the notes filter I was never really satisfied with it.

If there is interest I can clean up and document either or both of these filters (although not until after mid August). The LaTeX filter could be ported to Lua, but the HTML filter probably couldn't since it seems Lua filters don't traverse documents in linear order (@tarleb is this correct?), while the order in which Pandoc::Filter traverses the AST agrees with the order sections appear in the finished document.

bpj avatar Jul 19 '19 14:07 bpj

The LaTeX filter could be ported to Lua, but the HTML filter probably couldn't since it seems Lua filters don't traverse documents in linear order (@tarleb is this correct?)

Unfortunately, yes.. One could write a linear order walk function in Lua as a workaround, but ultimately we need to fix it in Haskell.

tarleb avatar Jul 23 '19 05:07 tarleb

@tarleb I forget, do we have an existing issue about this (the order of traversal)?

jgm avatar Jul 23 '19 19:07 jgm

We have issue #4456 to track this.

Edit: nope, bollocks: that's a completely different issue. I'll look again and create one if necessary.

tarleb avatar Jul 24 '19 05:07 tarleb

As of 2019-07-30 there is no such issue as far as I can see so I'll go ahead and create one.

bpj avatar Jul 30 '19 08:07 bpj

I'm not sure whether is this issue dead but I'll add my own suggestion nonetheless. I'd not complicate footnote/endnote syntax anymore and rather introduce something like footnote namespaces, which could be delimited from footnote identifier by semicolon or something, like [^footnote:5] or [^endnote:4]. By default, these would render all the same, and It would be matter of particular template or some configuration how they would be rendered in the final document. This would leave open space for definition of any number of footnote series (of which each could have its own numbering style, position, alignment etc.), unconstrained by predefined categories such as endnotes / sidenotes / footnotes and such.

My current use case for this is following: I'm translating book. In the translated document, there are two sets of footnotes: (1) endnotes from original author (2) notes from translator (me), which I'd like to distinguish in some way. I'd like to preserve numbering of original notes, but my insterted between them change their numbers. However, these notes aren't necessarily distinguished by their position (it does not matter whether they are footnotes or endnotes).

onvlt avatar Sep 25 '19 18:09 onvlt

@bpj -- did you ever rewrite your filters into Lua?

Adding more than one note stream would of course still be a very welcome change to the AST, whatever format may be chosen...

iandol avatar Jun 04 '21 04:06 iandol

Oh I just saw this is at least on the Roadmap:

https://github.com/jgm/pandoc/wiki/Roadmap https://github.com/jgm/pandoc-types/pull/34 https://github.com/jgm/pandoc/pull/4042

👍🏾

iandol avatar Jun 04 '21 04:06 iandol

Just wanted to add my hope that endnotes can be supported. My rationale is here. Many thanks.

donboyd5 avatar Apr 18 '22 13:04 donboyd5