syntect icon indicating copy to clipboard operation
syntect copied to clipboard

Update packages (sublime syntaxes)

Open robinst opened this issue 6 years ago • 11 comments

I'm looking at updating testdata/Packages and syntect's built-in dumps of them.

Just doing a git submodule update --recursive --remote && make packs, the second step fails with:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
ParseSyntax(InvalidYaml(ScanError { mark: Marker { index: 9438, line: 256, col: 12 },
info: "while parsing a node, did not find expected node content" }),
Some("testdata/Packages/Go/Go.sublime-syntax"))', libcore/result.rs:1009:5

I suspect this is something that some YAML parsers don't have a problem with, but the one we use for syntect does, or something that used to be allowed with an older YAML spec. I'll raise a PR for https://github.com/sublimehq/Packages/.

robinst avatar Jan 08 '19 03:01 robinst

see also https://github.com/chyh1990/yaml-rust/issues/118

keith-hall avatar Jan 08 '19 04:01 keith-hall

I've raised a PR to fix yaml-rust: https://github.com/chyh1990/yaml-rust/pull/122

Unfortunately it looks like yaml-rust is not actively maintained.

Should we:

  1. Regenerate the packs with the fix but keep the yaml dependency as-is (not sure if possible)
  2. Depend on a fork with the fix until a new version of yaml-rust comes out
  3. Patch the go syntax in our submodule until yaml-rust is fixed

What do people think?

robinst avatar Mar 01 '19 03:03 robinst

Depending on a fork sounds best to me

trishume avatar Mar 01 '19 03:03 trishume

Good news, my fix was merged and yaml-rust 0.4.3 was released, we don't need to fork :).

But there's another problem, with the Python syntax:

     Running `target/debug/examples/gendata synpack testdata/Packages assets/default_newlines.packdump assets/default_nonewlines.packdump assets/default_metadata.packdump testdata/DefaultPackage`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseSyntax(RegexCompileError("\\((?=(?x:\n  \\s+                      # whitespace\n  | [urfb]*\"(?:\\\\.|[^\"])*\" # strings\n  | [urfb]*\'(?:\\\\.|[^\'])*\' # ^\n  | [\\d.ej]+               # numerics\n  | [+*/%@-] | // | and | or # operators\n  | (\\b[[:alpha:]_][[:alnum:]_]*\\b *\\. *)*\\b[[:alpha:]_][[:alnum:]_]*\\b               # a path\n)*,|\\s*\\*(\\b[[:alpha:]_][[:alnum:]_]*\\b *\\. *)*\\b[[:alpha:]_][[:alnum:]_]*\\b)", Error(-114, target of repeat operator is invalid)), Some("testdata/Packages/Python/Python.sublime-syntax"))', src/libcore/result.rs:997:5

The regex with better formatting:

\((?=(?x:
  \s+                      # whitespace
  | [urfb]*"(?:\\.|[^"])*" # strings
  | [urfb]*'(?:\\.|[^'])*' # ^
  | [\d.ej]+               # numerics
  | [+*/%@-] | // | and | or # operators
  | (\b[[:alpha:]_][[:alnum:]_]*\b *\. *)*\b[[:alpha:]_][[:alnum:]_]*\b               # a path
)*,|\s*\*(\b[[:alpha:]_][[:alnum:]_]*\b *\. *)*\b[[:alpha:]_][[:alnum:]_]*\b)

The problem is this bit: \b * (note the space between)

Because this is within a (?x:...) where whitespace is ignored, it's equivalent to \b* which is invalid.

Note that the pattern comes from a variable, here: https://github.com/sublimehq/Packages/blob/master/Python/Python.sublime-syntax#L35

So now I'm wondering:

  • Is this just a bug in the syntax?
  • Does Sublime Text, when it includes the regex of a variable, wrap that regex in (?-x:...) to disable extended mode? In that case, we can do the same in syntect.
  • Does the engine that Sublime Text use treat (?x:\b *) differently than onig?

@keith-hall maybe you can help out? :)

robinst avatar Mar 08 '19 06:03 robinst

I believe it is a bug in the syntax definition and a bug in ST that it "works": https://github.com/SublimeTextIssues/Core/issues/2354

keith-hall avatar Mar 08 '19 06:03 keith-hall

Hmm, that looks like a similar problem, but I'm not sure it's the same. In our case, we only have a (?x:...), not a bare (?x)...

robinst avatar Mar 08 '19 06:03 robinst

Good point. Maybe ST's regex engine ignores the \b* completely rather than reporting it as an error? I'm not at a computer to experiment atm. I'm pretty certain ST doesn't implicitly wrap variables in a (?-x:...), I guess this should be possible to test in ST by trying a variable that relies on extended mode and setting it from where it is referenced, seeing how it matches text, and/or trying an invalid pattern (i.e. missing a closing paren) to see what "expanded" pattern ST shows in the error message.

keith-hall avatar Mar 08 '19 06:03 keith-hall

You were right! So:

  1. \b* does not result in an error in Sublime Text's regex engine

  2. An included pattern is not implicitly wrapped in (?-x: ...), e.g. this:

    variables:
      var: a b c
    contexts:
      main:
        - match: '(?x: {{var}})'
          scope: test.good
    

    Matches abc

  3. If I force Sublime Text to use the Oniguruma engine by appending (?<!0), it results in the same error:

Screen Shot 2019-03-11 at 17 04 03

So that means it's just a bug in the Python syntax, I'll raise a PR for it.

robinst avatar Mar 11 '19 06:03 robinst

Thanks for taking the time to investigate and solve this @robinst :)

keith-hall avatar Mar 11 '19 06:03 keith-hall

PR: https://github.com/sublimehq/Packages/pull/1897

robinst avatar Mar 11 '19 06:03 robinst

Raised a PR with the updates here: #246

robinst avatar Mar 24 '19 10:03 robinst