parsedown Parsedown does not treat emphasis tags (*,

Daring Fireball's documentation states:

Markdown treats asterisks (*) and underscores (_) as indicators of emphasis. (...) But if you surround an * or _ with spaces, it’ll be treated as a literal asterisk or underscore. ( https://daringfireball.net/projects/markdown/syntax#em )

So, in most Markdown parsers (including GitHub): *This is italic* produces: This is italic

and: * This is not italic * produces: * This is not italic *

However, the current version of Parsedown treats both as italic. This is confirmed with the current version of the http://parsedown.org/demo site, which shows Markdown PHP working as expected, and Parsedown tripping over.

Sep 27 '17 15:09 richjeffery

Happy birthday, bug! Any further movement on this getting resolved, as it's been sitting as a bug within my application for a year now.

Sep 27 '18 15:09 richjeffery

I can replicate this on release 1.7.1 and directly on master - but not on the demo site http://parsedown.org/demo. There it has the same behaviour as Markdown PHP 1.3 and the results of Guthub preview. Any ideas on what version is being used on the demo site?

Oct 14 '18 17:10 alec-w

Looks like this can be fixed by amending the $EmRegex to exclude matches that start/end with whitespace. The current regex is

protected $EmRegex = array(
    '*' => '/^[*]((?:\\\\\*|[^*]|[*][*][^*]+?[*][*])+?)[*](?![*])/s',
    '_' => '/^_((?:\\\\_|[^_]|__[^_]*__)+?)_(?!_)\b/us',
);

changing this to

protected $EmRegex = array(
    '*' => '/^[*]([^\s](?:\\\\\*|[^*]|[*][*][^*]+?[*][*])+?[^\s])[*](?![*])/s',
    '_' => '/^_([^\s](?:\\\\_|[^_]|__[^_]*__)+?[^\s])_(?!_)\b/us',
);

sorts it. Similarly, the $StrongRegex ought to be updated too, currently

protected $StrongRegex = array(
    '*' => '/^[*]{2}((?:\\\\\*|[^*]|[*][^*]*+[*])+?)[*]{2}(?![*])/s',
    '_' => '/^__((?:\\\\_|[^_]|_[^_]*+_)+?)__(?!_)/us',
);

and following the same pattern as above this becomes

protected $StrongRegex = array(
    '*' => '/^[*]([^\s](?:\\\\\*|[^*]|[*][*][^*]+?[*][*])+?[^\s])[*](?![*])/s',
    '_' => '/^__(([^\s]?:\\\\_|[^_]|_[^_]*+_)+?[^\s])__(?!_)/us',
);

Testing this with

$parsedown = new Parsedown();
echo $parsedown->line('*italic* * not italic* *also not italic * *italic again*');
echo '</br>';
echo $parsedown->line('_italic_ _ not italic_ _also not italic _ _italic again_');

previously gave italic not italic also not italic italic again italic not italic also not italic italic again but now gives italic * not italic* *also not italic * italic again italic _ not italic_ _also not italic _ italic again gives the correct output now. @erusev if you agree with the above I can make a PR for it - or am I off the mark here?

Oct 14 '18 20:10 alec-w

I was actually just in the process of exploring a fix around moving more towards the CommonMark spec for defining the opening and closing delimiter runs: since this'll help avoid some other discrepancies. I'm hoping that it might be possible to do so in a way that doesn't require leaning on intersection rules since resolving precedence rules about intersecting inlines is currently out of scope (see https://github.com/erusev/parsedown/issues/588#issuecomment-377221179). If we're pulling apart the emphasis parsing, then it'll also be an opportunity to address some long-standing more widespread escaping issues (#434)—if we can tackle emphasis then it should be fairly easy to address the simpler inlines.

If I can't make anything nice work around the CommonMark definitions, then the solution you've outlined above would also seem address the issue raised here fairly directly. (Note that it doesn't tackle part (b) in the CommonMark definition's of left/right flanking though: https://spec.commonmark.org/0.28/#left-flanking-delimiter-run, IMO we should prefer a solution that incorporates the CommonMark definition if possible).

Oct 14 '18 21:10 aidantwoods

alec-w's response works well, but there is a minor mistake in the strong regex - I believe it should be:

protected $StrongRegex = array(
        '*' => '/^[*]{2}([^\s](?:\\\\\*|[^*]|[*][*][^*]+?[*][*])+?[^\s])[*]{2}(?![*])/s',
        '_' => '/^__(([^\s]?:\\\\_|[^_]|_[^_]*+_)+?[^\s])__(?!_)/us',
    );

Apr 16 '19 13:04 richjeffery

Hi guys!

Any updates related to this issue?

Thanks!

May 08 '19 11:05 xserrat

Quick update, turns out alec-w's fix breaks if you have just two characters between your tags.

i.e.: this _is_ italic will not work correctly, nor will this **is** bold.

Aug 07 '19 15:08 richjeffery

Figured out a correction that seems to work without breaking with only two or one character between your emphasis tags:

    protected $StrongRegex = array(
        '*' => '/^[*]{2}(?:([^\*\s]|[^\W](?:\\\\\*|[^*]|[*][^*]*+[*])*?[^\W]))[*]{2}(?![*])/s',
        '_' => '/^[_]{2}([^\W_]|(?:[^\W](?:\\\\_|[^_]|_[^_]*+_)*?[^\s]))[_]{2}(?!_)/s',
    );

    protected $EmRegex = array(
        '*' => '/^[*](?:([^\*\s]|[^\W](?:\\\\\*|[^*]|[*][^*]*+[*])*?[^\W]))[*](?![*])/s',
        '_' => '/^[_]([^\W_]|(?:[^\W_](?:\\\\_|[^_]|_[^_]*+_)*?[^\s_]))[_](?!_)/s',
    );

Aug 23 '19 14:08 richjeffery

Can't we fix this with a lookaround?

Here's how it'd look for $EmRegex['_']:

- /^_((?:\\\\_|[^_]|__[^_]*__)+?)_(?!_)\b/us
+ /^_(?!\s)((?:\\\\_|[^_]|__[^_]*__)+?)(?<!\s)_(?!_)\b/us

Basically, the modification adds (?!\s) after the opening _ and a (?<!\s) before the closing _

Tests seem to pass.

Here's a quick demo: https://regex101.com/r/cBxRKo/1

Aug 23 '19 18:08 erusev

@erusev - that's definitely a good solution. Whilst it requires more steps to complete on short phrases (e.g. _hello_) it definitely is a significantly superior regex in terms of completion steps for longer sentences. Thanks!

Aug 27 '19 08:08 richjeffery

Tests seem to pass.

Here's a quick demo: https://regex101.com/r/cBxRKo/1

Looks like the testing may be lacking some nuance - just realised today that a bug I found with ***Test*** (which should be bold-italic) renders as Bold with an asterisk on each side when using this change.

Sep 21 '21 12:09 richjeffery

parsedown
parsedown copied to clipboard

Parsedown does not treat emphasis tags (*, _, etc.) as literal when spaced out

parsedown parsedown copied to clipboard

Parsedown does not treat emphasis tags (*, _, etc.) as literal when spaced out

parsedown
parsedown copied to clipboard