text-based "Artwork" in PDF misaligns depending on whether there is leading whitespace and the first non-whitespace character is not US-ASCII
Describe the issue
See the attached XML file, which contains some text-based <artwork> objects.
textual artwork like this:
└┬╴Root of the tree
├─╴Child A
└─╴Child B
Should show the vertical stem aligned, like so:
with the C of the child rows under the first o of the root row.
In practice, in a PDF, it tends to be misaligned, like this:
Interestingly, if there is a leading non-space character on each line, it seems to be correctly aligned:
This is an impediment to the effective processing of draft-ietf-lamps-e2e-mail-guidance, which is in AUTH48 right now.
Code of Conduct
- [x] I agree to follow the IETF's Code of Conduct
I submitted test.xml to https://author-tools.ietf.org/ and it yielded test.pdf which shows:
From the underlying XML source:
<section anchor="utf8">
<name>Test with UTF-8</name>
<t>This artwork contains some non-ASCII UTF-8 chars:</t>
<artwork><![CDATA[
└┬╴Root of the tree
├─╴Child A
└─╴Child B
]]></artwork>
</section>
<section anchor="us-ascii">
<name>Test with US-ASCII</name>
<t>This artwork contains only US-ASCII:</t>
<artwork><![CDATA[
\-+- Root of the tree
+-- Child A
\-- Child B
]]></artwork>
</section>
<section anchor="no-leading-space">
<name>Test without leading whitespace</name>
<t>This artwork has no leading whitespace</t>
<artwork><![CDATA[
A └┬╴Root of the tree
B ├─╴Child A
C └─╴Child B
]]></artwork>
</section>
Note that the version that isn't aligned is the part with UTF-8 box drawing characters, but without the leading characters on the left-hand side.
@kesara is this a Weasyprint issue?
Regarding the scope of this issue, artwork is misaligned in the PDF in the following sections:
rfc9787.pdf Section 4.1.1.1 Section 4.1.2.1 Section 4.1.2.2 Section 6.2.1.1 (second structure) Section 7.3 (both structures)
rfc9788.pdf Section 1.9 Section 4.5.1 (first structure) Section 4.5.2 (first structure) Section 4.10.1 Appendix C.1.3 Appendices C.1.5-C.1.8 Appendices C.2.2-C.2.6 Appendices C.3.1-C.3.17
#1259 is a partial fix which addresses the box-drawing character issues. See examples: rfc9787 and rfc9788
Issue with arrow characters might be due to the roboto-mono fonts or how WeasyPrint use them. I need to look further into that.
@kesara thanks for working on a fix! on page 12 of https://t4.fq.nz/xml2rfc.1259/rfc9787.pdf i see this:
But the document source should have one fewer line, as seen at https://www.ietf.org/archive/id/draft-ietf-lamps-e2e-mail-guidance-17.html#section-4.1.1.1:
└┬╴multipart/signed; protocol="application/pkcs7-signature"
├─╴[protected part]
└─╴application/pkcs7-signature
did the change introduce a new line into the generated PDF output, or was that just from fiddling with the source XML during testing?
@dkg, I did had mangled with original XML ~~but it wasn't causing that gap.~~ I have updated PDF output with original XML.
~~I think the gap is caused by the square brackets. May be WeasyPrint is using a different font or rendering that line height incorrectly.~~
Well, I have used the wrong source again. Looks like that gap was due to my changes.
gotcha, thanks for double-checking. i think the problem there wasn't the gap, it was that there was 4 lines instead of 3 :)
So now we just need to understand why the lines that start with whitespace followed by either ⇩ or ↧ aren't using monospace whitespace before the first printable character.
RFCs use Roboto Mono font for monospace characters. Looks like those arrow characters are not provided by the Roboto Mono^1.
What if you put + [ "monospace" ] before NOTO_SYMBOLS in the monospaced font list (since neither entry in NOTO_SYMBOLS appears to be fixed width)? or, what if you explicitly ask for Noto Sans Mono as a fallback for glyphs missing from Roboto Mono, as suggested in #1261 ?
@kesara, thanks for running it through your tooling to re-generate the PDF over on #1261. it appears that it's still using Noto-Sans-Symbols2 for U+21A7 and U+21E9 (and all whitespace that precedes those chars on any given line).
I looked in the Noto and Roboto font families and found that while U+21E9 is present in Noto Sans Mono CJK, U+21A7 is not present in any of them.
And anyway, the Mono CJK variants all appear to have a different character width for the same font size than the other Mono fonts due to following the "halfwidth" design of most asian font families. This means that even though they're "fixed width", their fixed width won't align with the fixed width of the non-Asian families.
So, i see four possible resolutions:
- Augment Roboto Mono or Noto Mono by adding the two glyphs in question, and ensure they have the same width as the rest of the font.
- Fix the leading whitespace to use Roboto Mono on lines that start with non-whitespace glyphs that require a fallback font, and accept the minor horizontal variance caused by the single glyph in the fallback font.
- Insert a fixed-width, non-Roboto, non-Noto font (that contains the two glyphs in question) as another fallback font before the Noto Symbols font. (i've offered this in the second commit over on #1261)
- Replace these two symbols with codepoints that do have glyphs in either Roboto Mono or Noto Sans Mono. For example, we could use U+2534 BOX DRAWINGS LIGHT UP AND HORIZONTAL (┴) to mean "unwraps to" and U+2567 BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE (╧) to mean "decrypts to". These two symbols are in Noto Mono, but not in Roboto Mono, so i think we'd still need the first commit over on #1261.
Do you have any preference to how this should be resolved?
From the discussion over on #1261, it sounds like option 3 is out, unless the rpat changes their decisions.
Fwiw, i've proposed an edit for RFC 9787's XML source that takes approach 4. If the RFC editor gives me a 👍 then i will propose a comparable edit for RFC 9788 as well. This approach will still rely on you to merge something like 00dae3832b3583099a1a92c7048eeeff102243d2 though.
@dkg I will bring this issue in next RPAT^1 meeting.
Thanks, @kesara ! let me know if i can be of any further help.
Perhaps RPAT would be interested in https://github.com/ietf-tools/idnits/issues/186 as a proposed outcome of the decision to use only Noto and Roboto Mono in the PDF format.