pandoc
pandoc copied to clipboard
ODT writer makes one hardcoded style set for each list/entry-level found in the MD source
Pandoc dynamically generates redundant list styles which is a problem when you need some customization
I used pandoc -o test.odt test-odt-list.md to generate an ODT document from a MD source containing 2 nested bullet lists and 2 nested ordered lists.
Pandoc's ODT writers generated 6 different text:list-style named L1 to L6 and 6 diferent style:style named P1 to P6:
- One of each for every bullet list level
- The first bullet list has 2 level and uses the following styles:
<text:list text:style-name="L1">for the whole list<text:p text:style-name="P1">for the first level<text:p text:style-name="P2">for the second level
- The second list is a duplicate of the firs one and uses styles "L3", "P3", and "P4".
- The first bullet list has 2 level and uses the following styles:
- One of each for every ordered list
- The first ordered list uses the following styles:
<text:list text:style-name="L5">for the whole list<text:p text:style-name="P5">for every entry, whatever the level
- The second ordered list uses styles "L6" and "P6"
- The first ordered list uses the following styles:
N.B. no custom reference.odt
Why is it a problem ?
- It makes list customization very difficult to perform even with post ODT file post-processing.
- Contrary to the ordered list, the bullet list doesn't use correctly the list hierarchy witch makes the problem worse.
- It generate extra code in the ODT file
Remark
I don't see the point of genarating dynamic hardcoded list styles since Pandoc has an embeded reference.odt. Whouldn't it be easier to define the list styles in reference.odt and use them in the writer ?
I confess, I had a look to ODT.hs and OpenDocument.hs but I definitely cannot read Haskell code. So maybe there is a good reason for hardcoded styles.
Pandoc version 3.1.8 on Ubuntu 22.04.3 LTS pandoc 3.1.8 Features: +server +lua Scripting engine: Lua 5.4 User data directory: /home/chris/.local/share/pandoc Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org This is free software; see the source for copying conditions. There is no warranty, not even for merchantability or fitness for a particular purpose.
MD Source code
# Un Titre
## Des listes
Une liste :
* bla
- lala
- lili
- lolo
* blabla
* blablabla
## Un Sous-titre
Et encore des puces ?
* bla
- lala
- lili
- lolo
* blabla
* blablabla
Numéros :
1) Un
1) Un point un
1) Un point un point un
2) Un point un point deux
2) Un point deux
2) Deux
Encore des numéros ?
1) Un
1) Un point un
1) Un point un point un
2) Un point un point deux
2) Un point deux
2) Deux
I agree, this has always puzzled me about the opendocument writer. The original author is no longer active with the project, so I don't think we can find out what the motivation was.
I'm open to exploring improvements here, but I don't have a good enough grasp of the opendocument/ODT format to take this on myself.
I'm open to exploring improvements here, but I don't have a good enough grasp of the opendocument/ODT format to take this on myself.
Thanks @jgm . The bullet list case seems to be quite straightforward since there is one flavor only in pandoc.
- Provide a
reference.odtwith one default bullet list style and one paragraph style for the list items. I can do that. - Modify
OpendDocument.hsto use these two styles instead of dynamic ones. - Modify
OpendDocument.hsto stop generating those dynamic styles.
As a matter of fact, I already dit a proof of concept with a custom reference.odt. I used style names identical to those which are generated by the writer. It works as long as there is only one list in the document.
The ordered list case is a little bit more complicated since there are multiple flavors (number style) and toppings (separator). Maybe we could:
- Do the same as for bullet lists with 2 default styles (one list, one paragraph) built in the
reference.odt. I can do that too. - Generate dynamic styles which would inherit from the default styles. I can propose an xml definitions of the styles but not implement them in Haskell.
Another complication is that you have margins as part of these styles: e.g.,
<style:list-level-label-alignment text:label-followed-by="listtab" text:list-tab-stop-position="1.5in" fo:text-indent="-0.25in" fo:margin-left="1.5in" />
The margin needed for a list is not predictable from the list level alone, since the list may be embedded in another construction that is indented, e.g. a block quote.
The margin needed for a list is not predictable from the list level alone, since the list may be embedded in another construction that is indented, e.g. a block quote.
This is right, but only because the writer doesn't handle things like block quote the way it should. It looses the document structure here and simply generate a paragraph style with a bigger margin. That's yet another issue. It should do that way:
<text:p text:style-name="Quotations">Output quoted text in with the "Quotations" paragraph style </text:p>
<text:list xml:id="list-id" text:style-name="list-style-name">
<text:list-item>
<text:p text:style-name="paragraph-style-name">List item</text:p>
</text:list-item>
</text:list>
One paragraph style should do for every level of nested list items and inherits from the margin of the Quotations style and the list styles (bullet character, etc.) like this:
<style:style style:name="paragraph-style-name" style:family="paragraph" style:parent-style-name="Quotations" style:list-style-name="list-style-name">
</style:style>
No other definitions should be needed for this one but I suggest that we use a style in reference.odt to allow whatever future customization.
How would the Quotations style be defined? If the definition includes indentation, how does that interact with definition of list styles?
The Quotations paragraph style mostly defines a left and a right margin. Those are inherited by the paragraph style used for the list items.
There is not a unique way for defining a list style. The one I found most interesting for us defines globally 10 levels of nested lists with for each level:
- the bullet character (
text:bullet-char) - the indentation relative to the paragraph margin (
text:space-before) - the minimum width of the label (
text:min-label-width)
It is a little bit different from what Pandoc's ODT writer currently do.
As a result, the bullet's position of an item of a list nested in another list in a block quote will be:
p = Quotations paragraph style left margin + value of(text:space-before at level 2)
indentation relative to the paragraph margin
I see, so this should work fine within a block quote, because the block quote will simply adjust the paragraph left margin?
In that case, I agree that this approach would be best!
If finally had some time to spend on this issue and made a proof of concept with a Lua custom writer. So far I made bullet and ordered nested lists working with Quotations blocks. Most of the block and inline elements work with styles defined in the reference.odt document. No need of a bunch of dynamic styles. Styles and document structure comply with OpenDocument v1.3 specification and tested with LibreOffice 7.3.7.2
The POC is available here: https://github.com/chrisaga/hk-pandoc-writers/tree/main/odt