showdown icon indicating copy to clipboard operation
showdown copied to clipboard

white space in bold and italic causes problems

Open DrewNeon opened this issue 2 years ago • 9 comments

**bold** is surely converted to <strong>bold</strong>. However, it won't convert if there's a white space at the beginning or the end within **, i.e. ** bold**, **bold **, and ** bold **. This problem also applys to *italic* and ***bold&italic***, but not to ~~strikethrough~~ or ``code``. Is it a bug?

DrewNeon avatar Mar 29 '22 15:03 DrewNeon

No, it's not. According to John Gruber's specification:

But if you surround an * or _ with spaces, it’ll be treated as a literal asterisk or underscore.

It means that ** bold**, **bold **, ** bold ** are considered syntax errors, so it’s ok that Showdown doesn’t convert them. The same applies to both italic, bold + italic and other combinations of text emphasis.

The cases with code and strikethrough are different because they are expected to show the content inside "as is" so it is expected that white spaces are taken into account.

bandantonio avatar Mar 29 '22 16:03 bandantonio

emphasis and strong

The behavior for emphasis and strong (italic and bold), is correct.

In the original spec, it says:

But if you surround an * or _ with spaces, it’ll be treated as a literal asterisk or underscore.

The behavior is also consistent with the commonmark spec of which GFM is based.

Code

The behavior for inline code is also correct, according to the original spec and commonmark spec. See the examples in both links.

Strikethrough

Strikethrough is a GFM specific extension. According to the GFM spec:

Strikethrough text is any text wrapped in two tildes (~).

So, according to the written spec, ~~this ~~ should be parsed as strikethrough. However, if you test it in github, ~~this ~~ is not being parsed as strikethrough. Regardless, this is a different issue altogether.

tivie avatar Mar 29 '22 16:03 tivie

Thanks for the prompt responses! It's my bad that I didn't dig into the specs, and **, ~~, `` are so similar to the users, making me feel that they "should" behave the same. Of couse, a beginning or ending white space makes no sense, you can't really bold or italic a white space. It's just that the inconsistency in the spes costs extra efferts in developing a WYSIWYG markdown editor, so does that bold and italic share the same symbol. It's also a pity that markdown has no syntax for underline nor center alignment, given the handy HTML tags <u></u> and <center></center>. This might not be a big deal for tech docs, but could hinder markdown's usability to broader scenarios.

BTW, since @tivie mentioned github, is there a standalone javascript version of github's markdown editor available?

DrewNeon avatar Mar 30 '22 00:03 DrewNeon

Not that I'm aware of. Github editor uses redcarpet, which is written in Ruby.

tivie avatar Mar 30 '22 08:03 tivie

After futher playing around, I found something confusing.

**abc**de** is surely parsed to <strong>abc</strong>de**. As previously discussed in this thread, when I add a white space before abc, the first ** is not considered as a symbol for bold syntax, i.e. ** abc**de** is parsed to ** abc<strong>de</strong>. This is already understood. Weird thing is when I add a white space after abc, I'd expect the same as a white space before abc, but it's parsed to **abc *<em>de</em>*. I just wonder why.

The above abstract example may seem meaningless. I managed to produce a real scenario.

"Never trouble ** **trouble** ** till ** **trouble** ** troubles you." The bold words within **, i.e. **trouble**, are nouns.

The following line is directly parsed by github's editor, which is the supposed outcome.

"Never trouble ** trouble ** till ** trouble ** troubles you." The bold words within **, i.e. trouble, are nouns.

However, showdown parses it to

"Never trouble ** trouble ** till ** trouble ** troubles you." The bold words within **, i.e. *<em>trouble</em>*, are nouns.

Please note the code part above and do copy the full example code to showdown and see.

DrewNeon avatar Apr 03 '22 11:04 DrewNeon

yeah, It's a bug. These nested bold/italic edge cases are a pain.

The problem is here ---> **, i.e. **trouble**, the 2 asterisks at the beginning.

For instance, this is parsed correctly:

"Never trouble ** trouble ** till ** trouble ** troubles you." The bold words within \*\*, i.e. trouble, are nouns.

tivie avatar Apr 03 '22 19:04 tivie

Indeed, bold and italic sharing the same symbol is really a pain in the ass! I think the bug is related to the white space directly before the closing **, please try **abc **de**. It is parsed correctly if you escape the first **.

Moreover, I also tried some extreme instances, six * before and after a word, i.e. ******abc******, is parsed correctly, but seven or eight * not.

DrewNeon avatar Apr 03 '22 20:04 DrewNeon

yeah. Usually, if you want max portability, it is good practice to mix and match _ and *.

For instance, in your case:

"Never trouble ** __trouble__ ** till ** __trouble__ ** troubles you." The bold words within **, i.e. __trouble__, are nouns.

"Never trouble ** trouble ** till ** trouble ** troubles you." The bold words within **, i.e. trouble, are nouns.

see demo

tivie avatar Apr 03 '22 22:04 tivie

Exactly! It's much easier and clearer to use different symbols for each and every syntax, **_bold&italic_** is definately better than ***bold&italic***. If you are not bothered with that the same * symbol must be parsed differently according to the repeating times, this bug is automatically evitable. Don't understad why markdown allows such a confusion after developing so many years.

DrewNeon avatar Apr 03 '22 22:04 DrewNeon