refa icon indicating copy to clipboard operation
refa copied to clipboard

Backreference language sampling

Open RunDevelopment opened this issue 3 years ago • 0 comments

The JS parser is able to resolve backreferences where (among other conditions) the associated capturing group accepts a small finite language.

E.g. /(a|b)\1/ will be parsed as /aa|bb/

This is quite useful but it cannot handle capturing groups that accept infinitely many strings.

E.g. /(=+)a\1/ cannot be parsed.

However, in some use cases (such as static analysis), it might be enough to replace the backreferences with a sample of the language of the capturing group. The parse result will only approximate the input RegExp but this may be good enough for some use cases.

E.g. /(=+)a\1/ might get parsed as /=a=|==a==|===a===|====a====/ depending on the sampling. This isn't correct but it might be useful.

The sampling algorithm has to be provided by the user. The solution will always be imperfect and refa can't know which tradeoffs are acceptable.

RunDevelopment avatar Feb 25 '21 16:02 RunDevelopment