pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

ICML writer: add FirstParagraph and Bibliography styles.

Open jgm opened this issue 2 months ago • 11 comments

Closes #11268.

Not sure this is the right approach. Would it be better to add FirstParagraph in addition to Paragraph? (And similarly for Bibliography?) If the styles are nestable in this way (I don't know a thing about ICML), then this would be less disruptive, as people who have customized the Paragraph style would not need to do anything special when using the new writer.

Alternatively, perhaps we could have the style define the FirstParagraph and Bibliography styles in terms of Paragraph. (But this is less ideal for various reasons.)

jgm avatar Nov 05 '25 11:11 jgm

Nice! But I don't quite understand how the "in addition" part would work (I don't know enough about XML). I also don't know if ICML or Indesign would support that as in Indesign one can choose only one "based on" style per style. Also there is only one style per paragraph in Indesign.

Good point about disruption. It is true that previous suggestion might break layouts before designer assigns FirstParagraph to be based on Paragraph.

But I did a quick test to see how Indesign references parent styles. It seems that FirstParagraph and Bibliography styles can be based on Paragraph like this: <BasedOn type="object">ParagraphStyle/Paragraph</BasedOn> (replaces $ID/NormalParagraphStyle)

Attached example does this and loads as expected into Indesign. When relinking already placed ICML to a new version the following happens:

  • if style exists the new imported style does not override it
  • if style does not exist it is added to the style list

I guess doing it like this (by basing FirstParagraph and Bibliography on Paragraph) would mean that everything works as before. In old layouts the additional styles would appear in style list and can be defined from there or completely ignored.

pandoctest-1st-paragraph-and-bibliography-v2.icml.txt

hevonen avatar Nov 05 '25 16:11 hevonen

Yes, we could do it this way, but it would require some futzing with the way styles are now generated.

What about the other option: simply assigning both Paragraph and FirstParagraph styles? Does that work? Can styles override each other like in CSS?

jgm avatar Nov 05 '25 16:11 jgm

I don't think multiple styles can be added to same element. In ICML paragraph style is set with attribute and to my understanding there can be only one value per XML attribute. As the value is a string adding new items to it would make it a different string. Trying to add more "AppliedParagraphStyle" key-value pairs to an ParagraphStyleRange element generates error message "Duplicate attribute".

But nested definitions like this do seem to work if these are better match for pandoc:

<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
    <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/FirstParagraph">
    <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
        <Content>First paragraph.</Content>
    </CharacterStyleRange>
    </ParagraphStyleRange>
</ParagraphStyleRange>

hevonen avatar Nov 05 '25 22:11 hevonen

We do use multiple styles, though. E.g. for block quotes.

<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Blockquote &gt; Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>hi</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>

jgm avatar Nov 06 '25 09:11 jgm

Hmm I see (and maybe finally understand too), but isn't that just a single style which is named to look like two styles from XML point of view? There is a specific style with matching name in ICML root styles section:

<ParagraphStyle Self="ParagraphStyle/Blockquote &gt; Paragraph" Name="Blockquote &gt; Paragraph" LeftIndent="10">
  <Properties>
    <BasedOn type="object">$ID/NormalParagraphStyle</BasedOn>
  </Properties>
</ParagraphStyle>

Pandoc's multiple styles are a single style in ICML (a key-value pair). Every used style needs to also exist in ICML's root styles section to work. If e.g. "ParagraphStyle/Paragraph &gt; first" does not exist there it is undefined and paragraphs referring to it will default to [Basic Paragraph] in Indesign. Indesign does not do style combinations from a list of styles. It reads the "list of styles" as a specific style name. Because of this these styles look slightly weird: "Blockquote > Paragraph" instead of "Blockquote" (styles can be renamed but if linked ICML is updated the old styles appear again and renamed styles go unused).

So while "Name" and "Self" attribute values can be a list of styles that list is a single string identifier in ICML. Working example that mimics this is attached.

It would be nice if the new first paragraph root style would have BasedOn tag with "ParagraphStyle/Paragraph" to keep existing layouts the same (as in attached example ICML). But for me getting first paragraphs and bibliography tagged is more important as that change is quick to do in Indesign.

pandoctest-1st-paragraph-and-bibliography-v3.icml.txt

hevonen avatar Nov 06 '25 12:11 hevonen

Hm. Wouldn't we also need to say whatBlockquote > FirstParagraph is based on? (And so on?)

jgm avatar Nov 07 '25 10:11 jgm

Hmm true, I didn't think of that. But yes, if those styles are added it would be preferable to base their style to the element's basic style. Then change would be invisible in existing layouts with linked content (and ready for styling as a bonus as these new styles just appear in Indesign's style list). It also makes sense to have them (like for styling the first paragraph of a blockquote).

If every block has a first paragraph can it create a long list of emitted styles in ICML with nested content (I've no clue about pandoc's nesting levels or rules)? E.g. BlockQuote > FirstParagraph > BlockQuote > FirstParagraph > List > FirstItem etc? That might look little messy in Indesign.

hevonen avatar Nov 07 '25 12:11 hevonen

Block quotes can't be nested inside paragraphs. So you should only have FirstParagraph at the end of one of these sequences.

jgm avatar Nov 07 '25 12:11 jgm

That sounds fine! Is this a complicated to do? Bibliography paragraphs probably have no need for "first" style and maybe there are others than can be omitted.

hevonen avatar Nov 08 '25 13:11 hevonen

OK, I have implemented a system where FirstParagraph is based on Paragraph x > y > FirstParagraph is based on x > y > Paragraph etc. Please test this thoroughly!

jgm avatar Nov 08 '25 14:11 jgm

Thanks, I managed to compile it and it works! I tried with ≈160k character document and at least no contents were lost compared to ordinary pandoc.

I'm wondering what are the best practises for style naming and generation.

Current version emits FirstParagraph to every style including custom styles. This can lead to somewhat messy set of styles compared to e.g. docx export. Docx styles are also named differently (e.g. "Body Text" and "First Paragraph" vs "Paragraph" and "FirstParagraph") but I guess that pandoc does not have codified style names for different writers?

Currently bullet list styles have their first item style named as "BulList > first". Would it be a good practise to name FirstParagraph similarly "Paragraph > first"? And maybe omit "FirstParagraph" and "Paragraph" for bibliography and possibly for custom styles (and treat custom styles as one exact style)? Bibliography could just be "Bibliography". Also, would it be feasible to generate "FirstParagraph" after forced empty line (to me empty space indicates a new section of text even without heading)?

Also, current FirstParagraphs don't have base style set (maybe not a problem).

Would it be possible to skip some of these styles (or alternatively enable them) e.g. in metadata block? That might be a bad fit for other document sources than Markdown and adding command line options for this is probably not wanted either.

Example of generated styles in Indesign (markdown with some custom styles, "Basic Paragraph" is Indesign's base style):

ICML                                        DOCX
[Basic Paragraph]                           [Basic Paragraph]
BulList                                     Normal
BulList > first                             Body Text
Footnote > Paragraph                        First Paragraph
Header1                                     Compact
Headerl (unnumbered)                        Title
Header2                                     Author
Header3                                     Bibliography
Paragraph                                   Heading 1
author > Paragraph                          Heading 2
quote > Blockquote > Paragraph              Heading 3
quote > Paragraph                           Block Text
quoteauthor > Blockquote > Paragraph        Footnote Text
quoteauthor > Paragraph                     quote
thanks > BulList                            quoteauthor
thanks > BulList > first                    thanks
thanks > Header2
thanks > Paragraph
title > Header1
Bibliography > Paragraph
Bibliography > FirstParagraph
FirstParagraph
author > FirstParagraph
quote > FirstParagraph
quoteauthor > Blockquote > FirstParagraph
thanks > FirstParagraph

hevonen avatar Nov 13 '25 22:11 hevonen