php-markdown icon indicating copy to clipboard operation
php-markdown copied to clipboard

Emphasis not rendered when followed by a comma, period or semicolon + space

Open Danita opened this issue 6 years ago • 8 comments

When trying to parse something like hello *, this is emphasized* etc.. the part between the asterisks doesn't get rendered. It happens at least when the first asterisk is followed by a comma, period, or semicolon and a space. See the following image:

captura

Danita avatar Jun 15 '18 19:06 Danita

Interestingly, PHP Markdown is the only parser with that behavior, so I really ought to fix this. Thank you for the report.

michelf avatar Jun 15 '18 19:06 michelf

I've tracked the problem down to this line https://github.com/michelf/php-markdown/blob/lib/Michelf/Markdown.php#L1252 where ,.;: are not desired after a * or a _. I wonder why is that?

Danita avatar Jun 15 '18 19:06 Danita

I'm trying to remember, but can't find the reason. A simple way to find out would be to remove them, make a pull request and look at the failing tests from the auto tester.

michelf avatar Jun 15 '18 20:06 michelf

Are the tests running here on Github (via Travis) or shall I run them locally?

Danita avatar Jun 15 '18 20:06 Danita

Here, using Travis. You can run them locally too.

michelf avatar Jun 15 '18 20:06 michelf

I think I found the reason. It's to avoid confusing asterisks that are meant to be asterisks in situations like that:

This is an asterisk*. That is *emphasis*.

Many implementations get it right, including Github:

This is an asterisk*. That is emphasis.

While this works in PHP Markdown too, the way it's implemented it breaks other cases.

The basic idea to make that work is that the opening asterisk can be anywhere but at the end of a word, and so we check for whitespace after the asterisk. But since words are often followed by punctuation, punctuation counts as whitespace in this situation. That makes sense in the general case, but not in your example where there is no word preceding the punctuation. So I think the fix would be to not count punctuation as whitespace if the asterisk is preceded by whitespace.

michelf avatar Jun 16 '18 12:06 michelf

Sounds about right. In our case, though, it was enough to remove the punctuation from the regexes, because as our markdown is derived from a rich text editor, the asterisks are escaped.

Danita avatar Jun 19 '18 17:06 Danita

I can confirm this behavior with the Markdown text **; )** that doesn't get transformed into <strong>; )</strong>. Other unsuccessful tries:

  • ** ;)**
  • **;) ** Successful try:
  • **;)**

Hope it helps.

Related to https://github.com/friendica/friendica/issues/6938

MrPetovan avatar May 19 '19 03:05 MrPetovan