WeasyPrint icon indicating copy to clipboard operation
WeasyPrint copied to clipboard

'text-transform: capitalize' makes letters that aren't the first of each word lowercase

Open oliver-s-lee opened this issue 2 years ago • 5 comments

Just a quick bug I came across, the css property: 'text-transform: capitalize' makes every letter in the word lowercase, except for the first which is made uppercase. According to the spec, only the first letter should be modified and other letters should remain as typed.

cap-test.pdf

<!DOCTYPE html>
<html>
    <head>
        <style>
            body {
                text-transform: capitalize;
            }
        </style>
    </head>
    <body>
        my UPPER text
    </body>
</html>

oliver-s-lee avatar Mar 24 '22 09:03 oliver-s-lee

Thanks for this bug report!

Nobody ever complained since this feature has been added more than 10 years ago in 6ee2bad.

liZe avatar Mar 24 '22 11:03 liZe

The following seems to mimic the CSS property:

CAPITALIZE_RE = re.compile('\s*(^\W*|\s\W*)(\w)', re.MULTILINE)
CAPITALIZE_RE.sub(lambda m: m.group(1) + m.group(2).upper(), text)

summersz avatar Mar 24 '22 11:03 summersz

The following seems to mimic the CSS property:

We don’t want to change the first letter but the first typographic letter unit. CSS is never easy…

The function already exists somewhere for the :first-letter selector, I hope that it’s the same definition.

liZe avatar Mar 24 '22 11:03 liZe

Ah sorry for the multiple pings here. I had to make a few small changes and didn't think it would link here until I'd made the pull request.

I've modified the capitalize function to use unicode typographic letter units (which required the use of the regex module in order to support unicode grapheme matching). I've run some tests and it appears to be matching the CSS behaviour when applying "text-transform: capitalize".

I spent some time reading through the CSS documentation regarding typesetting and what defines a "letter" in this context, and I believe this now works as expected.

VeteraNovis avatar Aug 07 '22 13:08 VeteraNovis

The following seems to mimic the CSS property:

CAPITALIZE_RE = re.compile('\s*(^\W*|\s\W*)(\w)', re.MULTILINE)
CAPITALIZE_RE.sub(lambda m: m.group(1) + m.group(2).upper(), text)

Thanks for starting me on the right path :+1:

VeteraNovis avatar Aug 07 '22 13:08 VeteraNovis

Thanks @VeteraNovis! It fixed by #1703.

liZe avatar Aug 15 '22 14:08 liZe