pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

docx writer: Option to customize (or remove) title/author block

Open HeirOfNorton opened this issue 7 years ago • 11 comments

Hello, Would it be possible to add a way to customize, or at least remove, the Title/Author/Date block in Docx output?

Nearly any journal or publisher will have their own specific format for the title page/title block of a document submitted to them, but since Templates are not feasible in Docx, the Title/Author/Date block is hardcoded. This means a Docx file created with Pandoc will need to be edited manually to fix this.

If there was a command-line option to suppress the title-block entirely, then users could make their own title block in the source file or using a filter, using the new Custom Classes in master to control the style formatting.

Thank you.

HeirOfNorton avatar Sep 07 '16 21:09 HeirOfNorton

If you're willing to use a filter, you could simply have the filter remove title/author/date metadata.

jgm avatar Sep 07 '16 21:09 jgm

True, and probably what I will do in the meantime if necessary. That is not ideal, though, since that would also remove the Title and Author metadata from the resulting Docx file properties.

HeirOfNorton avatar Sep 08 '16 01:09 HeirOfNorton

This would be a problem that would be solved by being able to use proper .dotx templates, right? It might be time for us to start trying to think through that.

jkr avatar Sep 09 '16 12:09 jkr

Kinda? .dotx templates don't really do what Pandoc templates do. They are just normal Word documents that, when opened, create a new copy of the file rather than saving over the original.

Word does have a way to mark a spot for inserting other text, using the REF field code. Eg:

<w:fldSimple w:instr="REF Title">
    <w:r>
        <w:t>This will insert the Bookmark named Title</w:t>
    </w:r>
</w:fldSimple>

But it can be difficult to insert these for users. I think they can only be inserted for bookmarks that already exist in the document. Maybe, in that case, include basic bookmarks (Title, Author, etc.) in the default template.docx? I dunno, I don't know how difficult it is to parse such things in the source.

HeirOfNorton avatar Sep 09 '16 14:09 HeirOfNorton

Right -- I was thinking of parsing the fields and instrText. In particular, I was interested in the fact that you could make one with TITLE field and so on. A bit of experimentation suggests that they might actually be easier to parse than they are to make -- it really is a pain if you're not using one of the prefab ones.

The main plus here is that they do allow for some more fine-grained control of output. The problem with adding a command-line flag is that there might be someone who wants, say, Title, but not Author. It would be nice if there was a way to allow that. At the moment, while I think through this, it might remain filters, though.

Also -- not that this is a permanent solution -- it wouldn't be too hard to write a post-processing script in python that would inject the title and author into docProps/core.xml after you filter them out. (zipfile, etree, py-yaml).

jkr avatar Sep 09 '16 16:09 jkr

+++ Jesse Rosenthal [Sep 09 16 05:35 ]:

This would be a problem that would be solved by being able to use proper .dotx templates, right? It might be time for us to start trying to think through that.

Yes, probably a good idea.

jgm avatar Sep 12 '16 13:09 jgm

I still would like to see an option to not write the metadata to the head of the resulting docx. If I want to include them I can always do that via the reference-doc and I am able to include the information at the appropriate place.

tomk3003 avatar Jul 03 '18 15:07 tomk3003

If you're willing to use a filter, you could simply have the filter remove title/author/date metadata.

Here is a lua-filter for those who wish to follow that route. Put the following code in a file called stripmeta.lua:

function strip_meta(meta)
  meta.title = nil
  meta.subtitle = nil
  meta.author = nil
  meta.date = nil
  meta.abstract = nil
  return meta
end

return {{Meta = strip_meta}}

Then use it like this:

pandoc --lua-filter=stripmeta.lua -o output.docx input.md

W1M0R avatar Dec 10 '20 10:12 W1M0R

I was also often in a situation where skipping the default output of metadata blocks in DOCX output was needed. Hence I was wondering whether there is a consensus on what the preferred way forward is?

As I see this issue the solution would necessitate one of the following two:

(a) Implementing a simple command line switch that activates skipping unwanted output (e.g. --skip-standard-meta-output) - this is probably the simplest and the quickest solution.

(b) Implementing DOCX templating as is the case with a few other Pandoc supported formats - when designing templates in Microsoft Word, content controls can be used for this purpose.

I could probably find some time to implement the functionality under the point (a) above, if proposed approach is considered acceptable.

@jgm: I would suggest avoiding simultaneous introduction of DOTX templates due to the fact that this:

  • will not solve the issue by itself;
  • introduces complexity that is not strictly needed;
  • templates can be introduced further on in a consistent way (if needed at all).

mjfs avatar Apr 07 '21 14:04 mjfs

Regarding option (a) of @mjfs :

(a) Implementing a simple command line switch that activates skipping unwanted output (e.g. --skip-standard-meta-output) - this is probably the simplest and the quickest solution.

Unless I'm misunderstanding the code, which is very possible, much of Pandoc's "templating" seems set in two lines, Docx.hs lines 785--786. Introducing a command line switch to activate skipping certain output would probably redefine meta in line 785. Could such a switch also allow arbitrary reordering of these elements, too? Something taking an argument like "title, subtitle, date, abstract" would know to skip the author, but the same interface could also allow for reordering fields as needed. For instance, some humanities journals prefer "author, date, title, subtitle, abstract".

If I'm being greedy, it would also be nice to support custom metadata properties in this kind of interface, too, along the lines of what's already supported by https://github.com/jgm/pandoc/issues/3034. But that's probably another issue.

jmclawson avatar Sep 18 '22 14:09 jmclawson