typogrify icon indicating copy to clipboard operation
typogrify copied to clipboard

widont line break/newline behavior

Open ryneeverett opened this issue 10 years ago • 1 comments

If a string ends with a <br> and a single word, widont does nothing:

>>> widont('blah<br>blah')
'blah<br>blah'

This makes sense to me. But if a string ends with a <br>\n, widont replaces the newline with a &nbsp;:

>>> widont('blah<br>\nblah')
'blah<br>&nbsp;blah'

This doesn't seem right. While the first would render:

blah blah

the second would render:

blah  blah

ryneeverett avatar Oct 02 '14 23:10 ryneeverett

>>> re.match(r'\s', '\n')
<_sre.SRE_Match object; span=(0, 1), match='\n'>
>>> re.match(r'\s', r'\n')
>>>

This result came as a surprise to me, but explains why widont has this behavior with newlines. But is this the desired behavior? That is, is the text passed in supposed to be escaped already?

I believe this would be the easiest way to get the correct behavior in the above example:

text = 'blah<br>\nblah'
text = text.encode('unicode-escape')  # b'blah<br>\\nblah'
text = text.decode('utf-8')  # 'blah<br>\\nblah'
text = widont(text)  # 'blah<br>\\nblah'
text = text.encode('utf-8')  # b'blah<br>\\nblah'
text = text.decode('unicode-escape')  # 'blah<br>\nblah'

It seems like it would be preferable for typogrify to deal with this, and I think it can be done without any encoding/decoding.

ryneeverett avatar Oct 25 '14 20:10 ryneeverett