peggy Error parsing strings with curly braces in JavaScript blocks

This is example that break the parser:

str.replace(/\$\{/g, '\\${');

it throws:

Expected code block but "\n" found.

In my language playground.

As a workaround I've used escaped value:

str.replace(/\$\x7b/g, '\\$\x7b');

Curly braces in strings and regex confuse the parser. I've tried to use this on a Website:

{
  function foo(x) {
     return x.replace(/\}/g, '}');
  }
}

and it throws random errors.

Feb 14 '22 15:02 jcubic

Yes, this is a known documented limitation, which will be eliminated in the future, when API for replacing part of grammar for parsing action code will be developed (because the goal of some plugins for pegjs/peggy is to replace javascript with another language).

Feb 14 '22 16:02 Mingun

That's interesting. Do you plan to support one particular language or a bunch of them?

Peggy has a nice syntax it would be nice to be able to use similar syntax for other languages. Do you have anything ready?

BTW: That's funny:

{
  function foo(x) {
     // { {
     return x.replace(/\}/g, '}');
  }
}
start = "x"

this is valid!

Feb 14 '22 20:02 jcubic

I plan to make an API that will allow plugins replace source code parsing part of the peggy grammar, so the plugins for other languages can implement a minimal subset to correctly parse braces for their languages. Peggy itself will contain only JS parser subset.

When this will be implemented, I'll make issues/PRs to known plugins.

Feb 15 '22 05:02 Mingun

Since this is documented, can we close this?

Jun 01 '22 00:06 hildjj

I think, we can leave this open until proposed solution (pluggable CodeBlock parsers) will be implemented. This, however, requires some work to design a way for composing grammars, which is also required for import feature. Of course, we can implement a special mechanism just to support this case, but I think it will be better to use a generic solution

Jun 01 '22 04:06 Mingun

To summarise what I said in the other issue: Accounting for mismatched braces in strings (including template literals) and comments doesn't require the parser to know JavaScript, or even for the JavaScript to be valid. Recognising those inside code snippets is pretty simple and can be used to effectively escape braces.

The one thing that poses a problem is regex literals since / is also the division operator, and the rules for when something is division and when it's a regular expression can be complex. The most annoying edge case is determining whether a preceding pair of braces { ... } is an object or a code block, but if we say that dividing anything other than a number, an identifier or ) is user error, it becomes a lot simpler.

I haven't checked the performance of such a solution, but I expect the impact to be negligible compared to implementing a full ES parser.

Feb 22 '23 21:02 gamesaucer

peggy peggy copied to clipboard

Error parsing strings with curly braces in JavaScript blocks

peggy
peggy copied to clipboard