pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Loss of style for titles and/or subtitles in .docx output document

Open sschaenz opened this issue 1 year ago • 23 comments

I am using a customised ‘reference.docx’, which I created with the command ‘pandoc -o custom-reference.docx --print-default-data-file reference.docx’ and added individual styles for titles and subtitles. This file worked perfectly in Pandoc version 3.1.12 and formatted the title and subtitle as desired (e.g. frame with background colour and custom text colour).

However, after upgrading to Pandoc 3.5, the problem arises that titles and subtitles lose their formatting and the default formatting is used instead. The content is output correctly, but without the assigned formatting. This suggests that Pandoc may no longer recognise the custom styles or that the assignment of metadata to styles has been changed.

I am using a German version of Microsoft Word, so the style names in my ‘reference.docx’ file may correspond to the localised German names. Since I updated directly from Pandoc 3.1.12 to 3.5, I can't say exactly from which version the problem occurred. However, downgrading to 3.1.12 fixes the problem, so there seems to be a change in the newer Pandoc versions that affects the style assignment for metadata.

Regards Stefan

sschaenz avatar Oct 10 '24 16:10 sschaenz

Have you checked the changelog to see what relevant changes were made between 3.1.12 and 3.5? https://pandoc.org/releases

jgm avatar Oct 10 '24 17:10 jgm

Yes, I have read the release notes and found some possible clues, but they don’t help me resolve the issue as a user.

I have spent several hours researching and testing, but the documentation did not provide any insights to resolve my issue.

My understanding was that Pandoc’s reference file could be used to customize the layout of a Word document as long as only the Pandoc-supported styles were modified or utilized. This approach had always worked fine in the past. I also generated a new reference file using Pandoc 3.5 and ran targeted tests with it. However, when I modify the Title and Subtitle layouts (such as color or font), Word now uses the Standard layout style instead. This should not be happening.

I found the following relevant notes in the release history:

pandoc 3.2.1

“Clean up Abstract Title and Subtitle in default reference docx. Center Subtitle, remove color.”

  • This could be related to my problem, but it doesn’t help me find a solution.

pandoc 3.2

“Use current standard Word theme (#7280). This includes using the sans-serif font Aptos instead of the serif font Cambria, and default colors for headings. Remove duplicate DefaultParagraphFont in styles.xml.”

pandoc 3.1.12.2

Here’s one relevant note: “Detect caption by style name not id (#9518). The styleId can change depending on the localization.”

I suspect that my issue with Title and Subtitle formatting might be related to a change in how style names are detected in Pandoc. In earlier versions, Pandoc always processed the reference.docx file based on English style names, regardless of the language settings in Microsoft Word.

It would be helpful if Pandoc could provide a way to standardize style name recognition independently of localization. This would prevent issues for users working with different language versions of Word.

sschaenz avatar Oct 10 '24 18:10 sschaenz

What lang are you using? Does specifying lang: en make the problem go away?

jgm avatar Oct 10 '24 19:10 jgm

lang: de lang: en

I had also tried it, but there was no improvement.

Hm, my problem should actually be easy to recreate, right?

sschaenz avatar Oct 10 '24 20:10 sschaenz

Can you post files necessary to reproduce the issue? Your reference.docx and a sample markdown input, plus the command line you used?

jgm avatar Oct 10 '24 20:10 jgm

Note that in the 3.2 revisions, we added a "Title Char" style (standard for Word). It may be that you need to adjust this style in your reference doc.

jgm avatar Oct 10 '24 20:10 jgm

There is also "Subtitle Char". These are character styles.

jgm avatar Oct 10 '24 20:10 jgm

Sorry for the delay, I've been very busy.

Here is an example that reproduces the problem for version 3.5 of Pandoc.

  1. display Pandoc version
$ pandoc --version                                                         
pandoc 3.5
  1. generate reference file from Pandoc (reference-3_5.docx)
$ pandoc -o reference-3_5.docx --print-default-data-file reference.docx
  1. open the reference file in Word and adjust the title and subtitle format (my-reference-3_5.docx)

Title: I have set the font colour to white, the spacing from the top to 80 pt and coloured a background frame in blue.

Subtitle: Font colour also white, spacing adjusted to 4 pt from the top and background colour in a lighter blue

  1. create Markdown example file (my-markdown.md
---
title: "Resource Template"
subtitle: "Subtitle Here"
date: "\today"
author: "My Name"
toc: true
toc-depth: 2
lang: en
customer_logo: false
client_name: "Client Name"
security_label: "Confidential"
...

# Introduction

Provide a brief introduction here.

# Section 1: Overview

Include a general overview of the resource.

## Subsection 1.1: Details

Details about the resource, including any relevant information, such as objectives, target audience, or specifications.

# Section 2: Implementation

Instructions or steps for implementing the resource.

## Subsection 2.1: Steps

1. Step one details
2. Step two details
3. Step three details

# Section 3: Additional Information

Include any additional relevant information, like references or contact details.

# Appendix

Include any additional resources or appendices here.
  1. Result (my-word-3_5.docx):

The title and subtitle are displayed in the ‘Normal’ style. Likewise ‘author’ and ‘date’. Date is not displayed because \today is a latex option and not for Word. If a date was entered, the date would appear. Format is also wrong here. There is also a page break in the template, which is also missing here. my-markdown.md my-reference-3_5.docx my-word-3_5.docx reference-3_5.docx

sschaenz avatar Oct 13 '24 14:10 sschaenz

Thank you for pointing out the updates in the 3.2 revisions regarding the "Title Char" and "Subtitle Char" styles. However, I’m having trouble with these points because I haven’t been able to find any relevant information in the documentation. I’m not sure how to access or adjust these character styles within the reference doc. Could you provide some guidance on how I can use these features?

sschaenz avatar Oct 13 '24 14:10 sschaenz

OK, this is very strange. Your reference doc has a Title style, but it doesn't get applied. I will have to look into this.

jgm avatar Oct 13 '24 15:10 jgm

But when I try the same thing you did -- same method of creating a reference.docx -- it works fine.

jgm avatar Oct 13 '24 15:10 jgm

OK, the issue is this. Your reference.docx has w:styleId="para6", and this style has <w:name w:val="Title">. The styleId needs to be Title.

When you create your reference.docx, go to the Styles menu, find the already existing Title style, and modify this. I'm not sure what you did differently to create the style you had.

jgm avatar Oct 13 '24 15:10 jgm

I will test it tomorrow. I still have a computer with an older version of Pandoc. I haven't had the problem on this system so far. I will check if the versions of Word are the same, which they should be. My system and Word versions are German. As far as I can remember, the style sheets that were generated from Pandoc were always in English, which did not cause any problems. In Word they were also displayed with the names in English. Now it looks as if Pandoc or Word translates the name of the template (in my case into German). The fact that I use macOS may also play a role.

There are therefore several factors that can play a role:

  • Pandoc version
  • Microsoft Word Version
  • System and / or Word language
  • Operation System

and certainly also the user (in this case me). However, I have tried to rule out errors by creating a new test file including a reference file from scratch.

The style id can probably change depending on the language, which is probably why the names are used. If these are now translated (by whatever means) or adapted to the system language, this would provide an explanation.

sschaenz avatar Oct 13 '24 18:10 sschaenz

If I recall, we use styleId and not the display name, because that is the thing that is constant across differently localized versions of Word.

jgm avatar Oct 13 '24 19:10 jgm

pandoc 3.1.12.2

Here’s one relevant note: “Detect caption by style name not id (https://github.com/jgm/pandoc/issues/9518). The styleId can change depending on the localization.”

see above.

sschaenz avatar Oct 13 '24 20:10 sschaenz

OK, I got it reversed then. I knew it was one way or the other!

jgm avatar Oct 13 '24 21:10 jgm

Here are my tests and the results:

macOS 14.6.1 (Sonoma) Intel Core i7

Microsoft Word for Mac Version 16.89.1 (German) Licence: Microsoft 365 subscription

pandoc --version pandoc 3.1.12.2

pandoc -o reference-3_1_12.docx --print-default-data-file reference.docx

The format template Title is displayed in Word as ‘Title’ and Subtitle as ‘Subtitle’.

Open reference-3_1_12.docx with Word There are no errors and no hints when opening. Everything is OK. Customise style sheet and save as my-reference-3_1_12.docx

pandoc my-markdown.md -o my-word-3_1_12.docx --reference-doc=my-reference-3_1_12.docx

Result:

Word file is opened.

1st warning: ‘This document contains fields that may refer to other files. Do you want to update the fields in this document?’ (ok) 2nd warning that the table of contents needs to be updated. (ok)

As the fields and the table of contents could not yet be set to a current and valid value by Word, these instructions are understandable. However, you should always call up a Word file first and then save it again before sending it to other people, as they often thank you that the file is damaged!

The formatting is correct, everything is as it should be.

System on which the problem occurs:

macOS 15.0.1 (Sequoia) M3 Pro (ARM) (Note: I must not have been paying attention, as I would rather not switch to Sequoia for a few months. A new major version of Apple can have all sorts of side effects. You should therefore wait at least 3 months until you upgrade, and the first updates are available).

Microsoft Word for Mac Version 16.89.1 (German) Licence: Microsoft 365 subscription

Open reference-3_5.docx with Word There are no errors or messages when opening. Everything is OK. Customize style sheet and save as my-reference-3_5.docx I have created a version here with compatibility mode activated to rule out any problems.

pandoc my-markdown.md -o my-word-3_5.docx --reference-doc=my-reference-3_5.docx pandoc my-markdown.md -o my-word-3_5-cmp.docx --reference-doc=my-reference-3_5-cmp.docx

Result

Word file is opened with the references as before. However, the formatting is not correct. Title and subtitle are displayed correctly, but the formatting is incorrect and set to default for title, subtitle, and date. The author is correct (and the style sheet has the English name). For ‘Table of Contents’ and the chapter and section headings, the identifiers of the template are displayed in German, but the formatting is correct.

I actually suspect the issue is with Pandoc. For once, I would like to exclude Word as the source of the issue. Theoretically, the difference in macOS versions could still be a cause. If this is the case, then there is hardly anything you can do, and you are at the mercy of the folks at Apple. The different architecture of the CPU should not play a role.

I have now installed Pandoc 3.1.12 on the system and repeated the tests. There are no problems. The formatting appears to be correct. So it is probably not due to the different macOS versions.

One last test:

Since I had installed Pandoc 3.5. with brew, I did another installation with the version from the site https://github.com/jgm/pandoc/releases/tag/3.5. I almost expected that this would solve the problem. Unfortunately not. Something has been changed somewhere in Pandoc that is causing the issues. That's the end of my ideas. But I'll save myself the trouble of trying to find out from which version the problem occurs.

sschaenz avatar Oct 14 '24 12:10 sschaenz

I can't see how the issue would be the OS version.

However, I use pandoc 3.5 on ARM macOS (previous version), and I don't have any difficulties customizing the style. There are two factors that may be different in our cases:

  • your German-localized version of Word (I have English-localized v16.83)
  • how you modify the styles (maybe you are doing something different than I am to change the styles -- I simply go Format -> Styles and select the Title style, then modify it there and save.

One thing that is clearly different is the styleId of the style named Title in your reference docx. This may be relevant, though as you note, we claim to be looking up styles by name.

When I have a chance I can look into this further.

jgm avatar Oct 15 '24 15:10 jgm

I don't think it's related to the OS version either, I just wanted to mention all the possibilities that came to my mind. The problem will be the name of the style sheets. These are translated by Word into German, for example. If I make a change, the style is saved and the translated name is used. Then pandoc can no longer function properly. This is normally why you use IDs and not names. I don't know if Microsoft sees it differently. Anyway, I will try the following in the next few days: I will create a template with the current version and then use a text editor to search and replace the designations with the English designations. This should be a work around. Since templates are not constantly customized, that should be okay with me. I'll let you know when I've tried it. For a quick test, you can also rename Title in one of your reference files with the German translation In Titel. If the formatting is then lost and the template is set to Standard, then that is the problem.

sschaenz avatar Oct 15 '24 16:10 sschaenz

Note that in my-reference-docx-3_5.docx, styles.xml has

<w:style w:type="paragraph" w:styleId="para4">
<w:name w:val="Title"/>
<w:qFormat/>
<w:basedOn w:val="para0"/>
<w:next w:val="para1"/>
<w:pPr>
<w:spacing w:before="1600" w:after="80"/>
<w:contextualSpacing/>
<w:jc w:val="center"/>
<w:pBdr>
<w:top w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
<w:left w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
<w:bottom w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
<w:right w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
<w:between w:val="nil" w:sz="0" w:space="0" w:color="000000" tmln="20, 20, 20, 0, 0"/>
</w:pBdr>
<w:shd w:val="solid" w:color="365F91" tmshd="1677721856, 16777215, 9527094"/>
</w:pPr>
<w:rPr>
<w:rFonts w:ascii="Aptos Display" w:hAnsi="Aptos Display" w:eastAsia="Aptos Display" w:cs="Aptos Display"/>
<w:color w:val="ffffff"/>
<w:spacing w:val="-10" w:percent="96"/>
<w:kern w:val="1"/>
<w:sz w:val="56"/>
<w:szCs w:val="56"/>
<w:lang w:bidi="en-us"/>
</w:rPr>
</w:style>

and the name specified here is "Title", not "Titel". So any localization of that name must be happening somewhere outside the stylesheet. Since according to the commit comment you mentioned above, we are looking up styles by name and not styleId, we should be finding this style.

jgm avatar Oct 16 '24 19:10 jgm

PS. I tried manually changing the styleId from para4 to Title, and then it worked.

jgm avatar Oct 16 '24 19:10 jgm

It looks like the linked commit may have been focused on just the table caption; maybe we didn't make a general change to looking up styles by name intead of styleId.

jgm avatar Oct 16 '24 19:10 jgm

It's quite counterintuitive that Word works this way -- it's the name, not the styleId, that stays constant across localized versions -- but such is MS.

jgm avatar Oct 16 '24 19:10 jgm

I have narrowed down the problem to a specific Pandoc version. When using Pandoc 3.2, titles and subtitles in the Word file are still displayed correctly according to my customised template. The changes made are properly applied.

However, a deviation occurs as of version 3.2.1: titles and subtitles only appear according to Word's default settings, regardless of the Pandoc or custom templates. My adjustments to the style sheet are ignored, but the content remains correct. Titles and subtitles are displayed with the correct content, but without the intended formatting.

I tested various Pandoc versions to investigate. The problem first appeared in version 3.2.1, while in version 3.2 the templates worked as expected. The tests were carried out with the binary versions of Pandoc, which I downloaded directly from the GitHub page (https://github.com/jgm/pandoc/releases).

I hope this helps to narrow down the error and find a solution.

I will stick with the older version for the time being until the problem is fixed.

sschaenz avatar Oct 21 '24 08:10 sschaenz

IN the 3.2.1 changelog for docx writer we have two items that might be relevant:

  • Allow OpenXML templates to be used with docx (#8338, #9069, #7256, #2928). commit db559e100c02ca1f95953f3eeeca005fdc01b595

  • Clean up Abstract Title and Subtitle in default reference docx. Center Subtitle, remove color. commit c26211b0c9d3b5d4f4040b3fcfbf090ddaf276d6 (This just makes Subtitle depend on Title rather than Normal, so I don't think it's the issue.)

jgm avatar Oct 24 '24 15:10 jgm

The OpenXML template contains:

+$if(title)$
+    <w:p>
+      <w:pPr>
+        <w:pStyle w:val="Title" />
+      </w:pPr>
+      $title$
+    </w:p>
+$endif$
+$if(subtitle)$
+    <w:p>
+      <w:pPr>
+        <w:pStyle w:val="Subtitle" />
+      </w:pPr>
+      $subtitle$
+    </w:p>
+$endif$

jgm avatar Oct 24 '24 15:10 jgm

[EDITED] We produce a docx with

<w:pStyle w:val="Title" />

and the way Word deals with this is to look up the style with styleId = "Title". The default pandoc reference.docx has such a style. When you edit it with your localized Word, change the Title style, and save it again, you get (my-reference-docx_3.5):

<w:style w:type="paragraph" w:styleId="Title">
<w:name w:val="Title"/>
<w:qFormat/>
<w:basedOn w:val="para0"/>
<w:next w:val="para1"/>
<w:pPr>
<w:spacing w:before="1600" w:after="80"/>
<w:contextualSpacing/>
<w:jc w:val="center"/>
<w:pBdr>
<w:top w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
etc.

So far so good, although it's odd that Normal seems to have changed to para0.

Then, when you use this reference docx to create a new docx, (my-word_3.5.docx), styles.xml contains:

<w:style w:styleId="para4" w:type="paragraph">
<w:name w:val="Title" />
<w:qFormat />
<w:basedOn w:val="para0" />
<w:next w:val="para1" />
<w:pPr>
<w:spacing w:after="80" w:before="1600" />
<w:contextualSpacing />
<w:jc w:val="center" />
<w:pBdr>
<w:top tmln="20, 20, 20, 0, 60" w:color="000000" w:space="3" w:sz="0" w:val="nil" />
etc.

I just don't get this. When I use pandoc to do the same thing you described, using your own my-reference_3.5.docx, I don't get this result. And although your Word may be localized, your pandoc is not. So that is not the issue.

I ought to be able to use pandoc on the same inputs and get the same result as you. This has nothing to do with Word. So, I'm wondering whether we can repeat the entire process carefully.

Take your file linked above, my-reference_3.5.docx, and do this exact command:

echo "% Title" | pandoc --reference-docx my-reference_3.5.docx -o output.docx
^D

And then upload output.docx.

jgm avatar Oct 24 '24 15:10 jgm

I have met with the same problem in MacOS, but in Linux system, this problem was gone. So are there any methods to fix it? @jgm

ctfysh avatar Dec 13 '24 06:12 ctfysh

@ctfysh I don't understand the problem yet. See the last comment before yours. If you want to give precise instructions about how to reproduce this, I can try to reproduce it. Until I can reproduce it I probably won't be able to help.

jgm avatar Dec 13 '24 17:12 jgm

OK, seeing below.

output.docx

ctfysh avatar Dec 15 '24 01:12 ctfysh