highlight.js icon indicating copy to clipboard operation
highlight.js copied to clipboard

fix(markdown) strong/emphasis requires whitespace after

Open joshgoebel opened this issue 2 years ago • 5 comments

Resolves #3519. Prevents false positive of emphasis when bullets are nested inside block quotes. This doesn't detect the bullets, it only fixes the false positive.

Checklist

  • [ ] Added markup tests, or they don't apply here because...
  • [ ] Updated the changelog at CHANGES.md

joshgoebel avatar Apr 12 '22 18:04 joshgoebel

Gonna start a quick discussion on this one. I did some light reading on the commonmark spec and learned two things:

  • newlines count as terminating the given bold/italic sequence
  • we'd get false positives with __ too if there's a space after it

Given that any whitespace (including newlines) terminate the sequence, maybe use \s instead of [ ]. So I've got this snippet:

const BOLD = {
  className: 'strong',
  contains: [], // defined later
  variants: [
    {
      begin: /_{2}(?!\s)/, // <- changed
      end: /_{2}/
    },
    {
      begin: /\*{2}(?!\s)/, // <- changed
      end: /\*{2}/
    }
  ]
};
const ITALIC = {
  className: 'emphasis',
  contains: [], // defined later
  variants: [
    {
      begin: /\*(?![*\s])/, // <- changed
      end: /\*/
    },
    {
      begin: /_(?![_\s])/, // <- changed
      end: /_/,
      relevance: 0
    }
  ]
};

(I'd submit it as a review but GitHub doesn't let me make changes on non-touched lines)

image

This PR as-is would result in the following,

image

allejo avatar Jul 08 '22 07:07 allejo

The markdown I'm using:

** i shouldn't be italic**

* and I shouldn't be italic either*

__ not really bold__

_ not italic_

Hello

**actually bold**

__again, bold__

*italic*
_oh i'm italic_

How GitHub renders this


** i shouldn't be italic**

  • and I shouldn't be italic either*

__ not really bold__

_ not italic_

Hello

actually bold

again, bold

italic oh i'm italic

allejo avatar Jul 08 '22 07:07 allejo

  • newlines count as terminating the given bold/italic sequence

Not completely accurate:

**not bold
**

*not italic
*

**should be
bold**

*should be
italic*

**not bold **

*not italic *

should be bold

should be italic

double-beep avatar Jul 08 '22 07:07 double-beep

So I've got this snippet:

Did you try and see if all the tests still passed with that snippet?

joshgoebel avatar Jul 08 '22 07:07 joshgoebel

So I've got this snippet:

Did you try and see if all the tests still passed with that snippet?

Ran it against npm run build_and_test and just got an auto-detect failure (but I think that's due to my Node version; this happens on main for me too).

  1720 passing (14s)
  3 pending
  1 failing

  1) hljs.highlightAuto()
       should be detected as sfz:

      AssertionError: sample.txt should be detected as sfz, but was gradle
      + expected - actual

      -gradle
      +sfz

Not completely accurate:

**not bold
**

*not italic
*

**should be
bold**

*should be
italic*

Ahhh good catch on this, you're right. Anything that's not "punctuation" before the closing sequence keeps it bold/italic. But if it's just a newline and closing, then it's not valid since the newline is considered "punctuation"

allejo avatar Jul 08 '22 08:07 allejo

I don't mean to overstep and push to your branch, @joshgoebel! I'd like to vote that we fix the immediate false positives by disallowing newlines inside of bolds and italics. And in the future, come up with a more robust expression that matches the markdown spec and can handle newlines (or whatever new APIs that may be added).

allejo avatar Oct 16 '22 04:10 allejo