pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

ODT writer makes one hardcoded style set for each list/entry-level found in the MD source

Open chrisaga opened this issue 10 months ago • 8 comments

Pandoc dynamically generates redundant list styles which is a problem when you need some customization I used pandoc -o test.odt test-odt-list.md to generate an ODT document from a MD source containing 2 nested bullet lists and 2 nested ordered lists.

Pandoc's ODT writers generated 6 different text:list-style named L1 to L6 and 6 diferent style:style named P1 to P6:

  • One of each for every bullet list level
    • The first bullet list has 2 level and uses the following styles:
      • <text:list text:style-name="L1"> for the whole list
      • <text:p text:style-name="P1"> for the first level
      • <text:p text:style-name="P2"> for the second level
    • The second list is a duplicate of the firs one and uses styles "L3", "P3", and "P4".
  • One of each for every ordered list
    • The first ordered list uses the following styles:
      • <text:list text:style-name="L5"> for the whole list
      • <text:p text:style-name="P5"> for every entry, whatever the level
    • The second ordered list uses styles "L6" and "P6"

N.B. no custom reference.odt

Why is it a problem ?

  1. It makes list customization very difficult to perform even with post ODT file post-processing.
  2. Contrary to the ordered list, the bullet list doesn't use correctly the list hierarchy witch makes the problem worse.
  3. It generate extra code in the ODT file

Remark

I don't see the point of genarating dynamic hardcoded list styles since Pandoc has an embeded reference.odt. Whouldn't it be easier to define the list styles in reference.odt and use them in the writer ?

I confess, I had a look to ODT.hs and OpenDocument.hs but I definitely cannot read Haskell code. So maybe there is a good reason for hardcoded styles.

Pandoc version 3.1.8 on Ubuntu 22.04.3 LTS pandoc 3.1.8 Features: +server +lua Scripting engine: Lua 5.4 User data directory: /home/chris/.local/share/pandoc Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org This is free software; see the source for copying conditions. There is no warranty, not even for merchantability or fitness for a particular purpose.

MD Source code


# Un Titre

## Des listes

Une liste :

* bla
	- lala
	- lili
	- lolo
* blabla
* blablabla

## Un Sous-titre

Et encore des puces ?

* bla
	- lala
	- lili
	- lolo
* blabla
* blablabla

Numéros :

1) Un

   1) Un point un

      1) Un point un point un
      2) Un point un point deux
   2) Un point deux

2) Deux

Encore des numéros ?

1) Un

   1) Un point un

      1) Un point un point un
      2) Un point un point deux
   2) Un point deux

2) Deux

chrisaga avatar Oct 14 '23 13:10 chrisaga

I agree, this has always puzzled me about the opendocument writer. The original author is no longer active with the project, so I don't think we can find out what the motivation was.

I'm open to exploring improvements here, but I don't have a good enough grasp of the opendocument/ODT format to take this on myself.

jgm avatar Oct 16 '23 04:10 jgm

I'm open to exploring improvements here, but I don't have a good enough grasp of the opendocument/ODT format to take this on myself.

Thanks @jgm . The bullet list case seems to be quite straightforward since there is one flavor only in pandoc.

  1. Provide a reference.odt with one default bullet list style and one paragraph style for the list items. I can do that.
  2. Modify OpendDocument.hs to use these two styles instead of dynamic ones.
  3. Modify OpendDocument.hs to stop generating those dynamic styles.

As a matter of fact, I already dit a proof of concept with a custom reference.odt. I used style names identical to those which are generated by the writer. It works as long as there is only one list in the document.

The ordered list case is a little bit more complicated since there are multiple flavors (number style) and toppings (separator). Maybe we could:

  1. Do the same as for bullet lists with 2 default styles (one list, one paragraph) built in the reference.odt. I can do that too.
  2. Generate dynamic styles which would inherit from the default styles. I can propose an xml definitions of the styles but not implement them in Haskell.

chrisaga avatar Oct 16 '23 19:10 chrisaga

Another complication is that you have margins as part of these styles: e.g.,

          <style:list-level-label-alignment text:label-followed-by="listtab" text:list-tab-stop-position="1.5in" fo:text-indent="-0.25in" fo:margin-left="1.5in" />

The margin needed for a list is not predictable from the list level alone, since the list may be embedded in another construction that is indented, e.g. a block quote.

jgm avatar Oct 17 '23 04:10 jgm

The margin needed for a list is not predictable from the list level alone, since the list may be embedded in another construction that is indented, e.g. a block quote.

This is right, but only because the writer doesn't handle things like block quote the way it should. It looses the document structure here and simply generate a paragraph style with a bigger margin. That's yet another issue. It should do that way:

<text:p text:style-name="Quotations">Output quoted text in with the "Quotations" paragraph style </text:p>
<text:list xml:id="list-id" text:style-name="list-style-name">
  <text:list-item>
    <text:p text:style-name="paragraph-style-name">List item</text:p>
  </text:list-item>
</text:list>

One paragraph style should do for every level of nested list items and inherits from the margin of the Quotations style and the list styles (bullet character, etc.) like this:

<style:style style:name="paragraph-style-name" style:family="paragraph" style:parent-style-name="Quotations" style:list-style-name="list-style-name">
</style:style>

No other definitions should be needed for this one but I suggest that we use a style in reference.odt to allow whatever future customization.

chrisaga avatar Oct 17 '23 12:10 chrisaga

How would the Quotations style be defined? If the definition includes indentation, how does that interact with definition of list styles?

jgm avatar Oct 17 '23 15:10 jgm

The Quotations paragraph style mostly defines a left and a right margin. Those are inherited by the paragraph style used for the list items. There is not a unique way for defining a list style. The one I found most interesting for us defines globally 10 levels of nested lists with for each level:

  • the bullet character (text:bullet-char)
  • the indentation relative to the paragraph margin (text:space-before)
  • the minimum width of the label (text:min-label-width)

It is a little bit different from what Pandoc's ODT writer currently do.

As a result, the bullet's position of an item of a list nested in another list in a block quote will be:

p = Quotations paragraph style left margin + value of(text:space-before at level 2)

chrisaga avatar Oct 17 '23 16:10 chrisaga

indentation relative to the paragraph margin

I see, so this should work fine within a block quote, because the block quote will simply adjust the paragraph left margin?

In that case, I agree that this approach would be best!

jgm avatar Oct 17 '23 17:10 jgm

If finally had some time to spend on this issue and made a proof of concept with a Lua custom writer. So far I made bullet and ordered nested lists working with Quotations blocks. Most of the block and inline elements work with styles defined in the reference.odt document. No need of a bunch of dynamic styles. Styles and document structure comply with OpenDocument v1.3 specification and tested with LibreOffice 7.3.7.2

The POC is available here: https://github.com/chrisaga/hk-pandoc-writers/tree/main/odt

chrisaga avatar Feb 11 '24 13:02 chrisaga