peggy
peggy copied to clipboard
Error parsing strings with curly braces in JavaScript blocks
This is example that break the parser:
str.replace(/\$\{/g, '\\${');
it throws:
Expected code block but "\n" found.
In my language playground.
As a workaround I've used escaped value:
str.replace(/\$\x7b/g, '\\$\x7b');
Curly braces in strings and regex confuse the parser. I've tried to use this on a Website:
{
function foo(x) {
return x.replace(/\}/g, '}');
}
}
and it throws random errors.
Yes, this is a known documented limitation, which will be eliminated in the future, when API for replacing part of grammar for parsing action code will be developed (because the goal of some plugins for pegjs/peggy is to replace javascript with another language).
That's interesting. Do you plan to support one particular language or a bunch of them?
Peggy has a nice syntax it would be nice to be able to use similar syntax for other languages. Do you have anything ready?
BTW: That's funny:
{
function foo(x) {
// { {
return x.replace(/\}/g, '}');
}
}
start = "x"
this is valid!
I plan to make an API that will allow plugins replace source code parsing part of the peggy grammar, so the plugins for other languages can implement a minimal subset to correctly parse braces for their languages. Peggy itself will contain only JS parser subset.
When this will be implemented, I'll make issues/PRs to known plugins.
Since this is documented, can we close this?
I think, we can leave this open until proposed solution (pluggable CodeBlock
parsers) will be implemented. This, however, requires some work to design a way for composing grammars, which is also required for import feature. Of course, we can implement a special mechanism just to support this case, but I think it will be better to use a generic solution
To summarise what I said in the other issue: Accounting for mismatched braces in strings (including template literals) and comments doesn't require the parser to know JavaScript, or even for the JavaScript to be valid. Recognising those inside code snippets is pretty simple and can be used to effectively escape braces.
The one thing that poses a problem is regex literals since /
is also the division operator, and the rules for when something is division and when it's a regular expression can be complex.
The most annoying edge case is determining whether a preceding pair of braces { ... }
is an object or a code block, but if we say that dividing anything other than a number, an identifier or )
is user error, it becomes a lot simpler.
I haven't checked the performance of such a solution, but I expect the impact to be negligible compared to implementing a full ES parser.