typogrify
typogrify copied to clipboard
widont line break/newline behavior
If a string ends with a <br>
and a single word, widont
does nothing:
>>> widont('blah<br>blah')
'blah<br>blah'
This makes sense to me. But if a string ends with a <br>\n
, widont
replaces the newline with a
:
>>> widont('blah<br>\nblah')
'blah<br> blah'
This doesn't seem right. While the first would render:
blah blah
the second would render:
blah blah
>>> re.match(r'\s', '\n')
<_sre.SRE_Match object; span=(0, 1), match='\n'>
>>> re.match(r'\s', r'\n')
>>>
This result came as a surprise to me, but explains why widont has this behavior with newlines. But is this the desired behavior? That is, is the text passed in supposed to be escaped already?
I believe this would be the easiest way to get the correct behavior in the above example:
text = 'blah<br>\nblah'
text = text.encode('unicode-escape') # b'blah<br>\\nblah'
text = text.decode('utf-8') # 'blah<br>\\nblah'
text = widont(text) # 'blah<br>\\nblah'
text = text.encode('utf-8') # b'blah<br>\\nblah'
text = text.decode('unicode-escape') # 'blah<br>\nblah'
It seems like it would be preferable for typogrify to deal with this, and I think it can be done without any encoding/decoding.