pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Multiline titles do not display in OpenDocument/ODT files

Open andrewheiss opened this issue 3 years ago • 3 comments

Using pandoc 2.19.2 on macOS 12.5.1, a multiline title will not display correctly in an .odt file because of nested paragraphs.


Ordinarily it is possible to add a linebreak to a title using YAML multiline block scalars, like so:

testing.md:

---
title: |
  This *is* \
  a test
author: Someone
---

Testing.

When using a multiline entry like this, metadata entry for title is treated as a block instead of an inline. This means that when rendering to HTML, the title content is wrapped in <p> tags:

pandoc testing.md -o testing.html --standalone
<h1 class="title"><p>This <em>is</em><br />
a test</p></h1>

When using a single line, like

---
title: This *is* a test
author: Someone
---

Testing.

...the content isn't wrapped in <p>s:

<h1 class="title">This <em>is</em> a test</h1>

With HTML output, it is valid and legal to have a <p> inside a <h1>, so there are no problems.

However, with ODT output, the document title will not show up when using linebreaks in a YAML key.

For instance, running

testing.md

---
title: |
  This *is* \
  a test
author: Someone
---

Testing.
pandoc testing.md -o testing.odt

creates a document that looks like this:

image

There's an empty paragraph where the title should be.

Inspecting the content.xml file inside the zipped .odt file shows that the title really is there, but it's wrapped in a <text:p> tag, as with HTML

<text:p text:style-name="Title">
  <text:p text:style-name="Text_20_body">This <text:span text:style-name="T1">is</text:span><text:line-break/>a test</text:p>
</text:p>

Placing a <text:p> tag inside another <text:p> tag is invalid OOXML syntax, and both LibreOffice and Word are unable to display it.

This is similar to these issues:

  • Multiline titles don't display when rendering to EPUB: https://github.com/jgm/pandoc/issues/8091 and https://github.com/jgm/pandoc/issues/8095. One solution there is to use line-block style values, like:

    title: |
        | This *is*
        | a test
    

    but doing this with ODT still creates the nested <text:p> issue.

  • Here (https://github.com/jgm/pandoc/issues/7262), when using JATS output, pandoc wraps some elements like <label> in <p> tags. The solution there is to use a Lua filter to remove the <p>s before rendering the document

    I tried making a similar Lua filter, but it doesn't work correctly likely because I'm trying to mess with the metadata title and using pandoc.RawInline() there is bad/invalid/illegal

    fix-multiline-title.lua

    if FORMAT:match 'odt' then
      function fix_odt(x)
        local result
        result = pandoc.write(pandoc.Pandoc(x), 'odt')
        return result:match('^<text:p text:style-name="Text_20_body">(.*)</text:p>$') or result or ''
      end
    
      function Meta (m)
        m.title = pandoc.RawInline("odt", fix_odt(m.title[1]))
    
        return m
      end
    end
    

Ideally, the multiline title should show up like this in content.xml:

<text:p text:style-name="Title">This <text:span text:style-name="T1">is</text:span><text:line-break/>a test</text:p>

…without nested <text:p> tags

andrewheiss avatar Aug 29 '22 16:08 andrewheiss

Whoa, one super easy fix that requires no additional work is to chomp the final newline in the YAML multiline block with a -

testing.md

---
title: |-
  This *is* \
  a test
author: Someone
---

Testing.
pandoc testing.md -o testing.odt

This produces a document with the correct XML:

<text:p text:style-name="Title">This <text:span text:style-name="T1">is</text:span><text:line-break/>a test</text:p>

And it displays correctly:

image

The presence/absence of the - chomping indicator leads to some fragility across formats. For instance, both

title: |
  This *is* \
  a test

and

title: |-
  This *is* \
  a test

work in Word and HTML and LaTeX. It's only ODT where it makes a difference.

andrewheiss avatar Aug 29 '22 16:08 andrewheiss

That's good that there's a workaround. The fix should be pretty easy too; we can just force reading this metadata field as Inline content.

jgm avatar Aug 29 '22 22:08 jgm

Related issue in docx writer: https://groups.google.com/d/msgid/pandoc-discuss/9161ae41-4195-40b7-9e60-679e2b7c49e4n%40googlegroups.com?utm_medium=email&utm_source=footer

jgm avatar Sep 10 '22 03:09 jgm