WeasyPrint Non-breaking spaces have different size

In this rendering: obrazek

First two words (o kliknutí) have a NBSP between them (czech typography rules). However, IMHO this only forbirds line breaks; there is no reason for this space to have a different size than other inter-word spaces on the same line.

Oct 06 '24 12:10 ondras

You’re right. The current justification code adds extra space to space characters only. There are many places in the code when we assume that spaces are only "normal" spaces.

In this case, the problem is in: https://github.com/Kozea/WeasyPrint/blob/1aae1452da384de0160f599d4e2500aa1f3bebfc/weasyprint/layout/inline.py#L1119-L1125

We count spaces and use it to set justification_spacing. We should change our space detection here: https://github.com/Kozea/WeasyPrint/blob/1aae1452da384de0160f599d4e2500aa1f3bebfc/weasyprint/layout/inline.py#L1131

And fix our spacing adjustment here: https://github.com/Kozea/WeasyPrint/blob/1aae1452da384de0160f599d4e2500aa1f3bebfc/weasyprint/text/line_break.py#L182-L196

For now we don’t have to support all justification opportunities as we don’t support text-justify, but we can at least support word separators. I’m not sure that a real list is actually defined in Unicode, as there are exceptions such as punctuation and fixed-width spaces. We can at least start with the list given by the specification.

Oct 09 '24 08:10 liZe

I’m not sure that a real list is actually defined in Unicode, as there are exceptions such as punctuation and fixed-width spaces.

Pango provides is_expandable_space, we have everything we need to support this correctly.

Jan 26 '25 13:01 liZe

Hello. I commited a fix for the issue. Should I write a test for it? If yes, how? I did not use Pango's is_expandable_space, as it just searches for \u0020 and \u0040 .

Feb 21 '25 17:02 luca-vercelli

Hi @luca-vercelli,

Thank for your commit.

I did not use Pango's is_expandable_space, as it just searches for \u0020 and \u0040 .

You got the idea. It actually looks for \u0020 and \u00A0, that’s what we want, but you’re right, we can look for these characters by ourselves. I think we can even use a regular expression instead of iterating on the bytestring.

Could you please open a PR so that I can tweak a couple of things in your commit?

Should I write a test for it? If yes, how?

That would be great! You can add a test in tests/draw/test_text.py, copy test_text_align_justify and try different spaces (\u0020, \u00a0, \u202F…) and check that only the first two spaces get extra space when justified.

If you don’t get the logic of these tests, don’t worry, I’ll add one for you.

Feb 21 '25 21:02 liZe

Fixed by #2390.

Mar 02 '25 16:03 liZe