pandoc
pandoc copied to clipboard
Multiline titles do not display in OpenDocument/ODT files
Using pandoc 2.19.2 on macOS 12.5.1, a multiline title will not display correctly in an .odt file because of nested paragraphs.
Ordinarily it is possible to add a linebreak to a title using YAML multiline block scalars, like so:
testing.md:
---
title: |
This *is* \
a test
author: Someone
---
Testing.
When using a multiline entry like this, metadata entry for title is treated as a block instead of an inline. This means that when rendering to HTML, the title content is wrapped in <p> tags:
pandoc testing.md -o testing.html --standalone
<h1 class="title"><p>This <em>is</em><br />
a test</p></h1>
When using a single line, like
---
title: This *is* a test
author: Someone
---
Testing.
...the content isn't wrapped in <p>s:
<h1 class="title">This <em>is</em> a test</h1>
With HTML output, it is valid and legal to have a <p> inside a <h1>, so there are no problems.
However, with ODT output, the document title will not show up when using linebreaks in a YAML key.
For instance, running
testing.md
---
title: |
This *is* \
a test
author: Someone
---
Testing.
pandoc testing.md -o testing.odt
creates a document that looks like this:
There's an empty paragraph where the title should be.
Inspecting the content.xml file inside the zipped .odt file shows that the title really is there, but it's wrapped in a <text:p> tag, as with HTML
<text:p text:style-name="Title">
<text:p text:style-name="Text_20_body">This <text:span text:style-name="T1">is</text:span><text:line-break/>a test</text:p>
</text:p>
Placing a <text:p> tag inside another <text:p> tag is invalid OOXML syntax, and both LibreOffice and Word are unable to display it.
This is similar to these issues:
-
Multiline titles don't display when rendering to EPUB: https://github.com/jgm/pandoc/issues/8091 and https://github.com/jgm/pandoc/issues/8095. One solution there is to use line-block style values, like:
title: | | This *is* | a testbut doing this with ODT still creates the nested
<text:p>issue. -
Here (https://github.com/jgm/pandoc/issues/7262), when using JATS output, pandoc wraps some elements like
<label>in<p>tags. The solution there is to use a Lua filter to remove the<p>s before rendering the documentI tried making a similar Lua filter, but it doesn't work correctly likely because I'm trying to mess with the metadata title and using
pandoc.RawInline()there is bad/invalid/illegalfix-multiline-title.lua
if FORMAT:match 'odt' then function fix_odt(x) local result result = pandoc.write(pandoc.Pandoc(x), 'odt') return result:match('^<text:p text:style-name="Text_20_body">(.*)</text:p>$') or result or '' end function Meta (m) m.title = pandoc.RawInline("odt", fix_odt(m.title[1])) return m end end
Ideally, the multiline title should show up like this in content.xml:
<text:p text:style-name="Title">This <text:span text:style-name="T1">is</text:span><text:line-break/>a test</text:p>
…without nested <text:p> tags
Whoa, one super easy fix that requires no additional work is to chomp the final newline in the YAML multiline block with a -
testing.md
---
title: |-
This *is* \
a test
author: Someone
---
Testing.
pandoc testing.md -o testing.odt
This produces a document with the correct XML:
<text:p text:style-name="Title">This <text:span text:style-name="T1">is</text:span><text:line-break/>a test</text:p>
And it displays correctly:
The presence/absence of the - chomping indicator leads to some fragility across formats. For instance, both
title: |
This *is* \
a test
and
title: |-
This *is* \
a test
work in Word and HTML and LaTeX. It's only ODT where it makes a difference.
That's good that there's a workaround. The fix should be pretty easy too; we can just force reading this metadata field as Inline content.
Related issue in docx writer: https://groups.google.com/d/msgid/pandoc-discuss/9161ae41-4195-40b7-9e60-679e2b7c49e4n%40googlegroups.com?utm_medium=email&utm_source=footer