html2text
html2text copied to clipboard
Extra space after a closing emphasis mark
$ echo '<em>hello</em>'{\,,\",:,\[,.,\!,\?}'<br>' | html2text
_hello_ ,
_hello_ "
_hello_ :
_hello_[
_hello_.
_hello_!
_hello_?
Note in the first three lines of the output, there is an extra space after the closing _
emphasis mark.
This is a bug, because Markdown has no problem with a punctuation immediately following the closing emphasis mark:
$ echo _hello_{\,,\",:,\[,.,\!,\?} | markdown
<p><em>hello</em>, <em>hello</em>“ <em>hello</em>: <em>hello</em>[ <em>hello</em>. <em>hello</em>! <em>hello</em>?</p>
The same rendered by GitHub: hello, hello" hello: hello[ hello. hello! hello?
I guess the extra space is added here:
https://github.com/Alir3z4/html2text/blob/099c4b8bfeea09d640e18324bb1d44f051371940/html2text/init.py#L295-L297
Or here, which explains why the bottom four results don't have the extra space:
https://github.com/Alir3z4/html2text/blob/099c4b8bfeea09d640e18324bb1d44f051371940/html2text/init.py#L860-L868
I would like to add, that maybe we should simply not add extra spaces around stressed text:
$ for i in _ \* __ \*\*; do echo "${i}foo${i}bar${i}baz${i}"; done
_foo_bar_baz_
*foo*bar*baz*
__foo__bar__baz__
**foo**bar**baz**
My markdown
produces:
$ for i in _ \* __ \*\*; do echo "${i}foo${i}bar${i}baz${i}" | markdown; done
<p><em>foo_bar_baz</em></p>
<p><em>foo</em>bar<em>baz</em></p>
<p><strong>foo</strong>bar<strong>baz</strong></p>
<p><strong>foo</strong>bar<strong>baz</strong></p>
But GitHub's rendering disagrees for the third __foo__bar__baz__
:
foo_bar_baz
foobarbaz
foo__bar__baz
foobarbaz
$ for i in _ \* __ \*\*; do echo "${i}foo${i}bar${i}baz${i}" | markdown | html2text; done
_foo_bar_baz_
_foo_ bar _baz_
**foo** bar**baz**
**foo** bar**baz**
So it seems, if we want to add extra spaces, it would be only when the stress mark is _
or __
-- *
and **
don't require extra spaces for Markdown to apply the stress, e.g., ***a**b*
-> ab = ok
-- which leads to the question: should -e
be the default, or maybe automatically use *
in where _
would require extra spaces (thereby irreversibly distorting the text).