regexp-tree icon indicating copy to clipboard operation
regexp-tree copied to clipboard

\s optimization changes runtime semantics

Open RReverser opened this issue 7 years ago • 5 comments

As per ECMAScript spec, \s matches any WhiteSpace or LineTerminator character:

1248bc49-5e3a-45b7-aa45-8426aa28fd0a

Currently optimizer docs say it transforms [ \t\r\n\f] to \s (and the opposite), but this changes runtime semantics of the RegExp as in fact \s matches many more characters:

e42aae83-29e9-47ad-982e-e49315c9db56 94cf74b5-69da-47e2-a7d5-751b30af462d

I don't know if this is an intentional deviation and these are considered as edge cases that should be ignored, but it would be nice to make such "loose" transformations at least optional under a flag, as otherwise an optimized regexp can behave differently than original one, which is dangerous to use as an ESLint plugin or a transpiler.

RReverser avatar Jun 30 '17 12:06 RReverser

Good catch! In practice though, as you mentioned, I'd consider it as an edge case. The "loose" mode sounds interesting, or we can just handle a white/black list of needed transforms, and exclude some transform if it changes the semantics on practice.

DmitrySoshnikov avatar Jul 01 '17 07:07 DmitrySoshnikov

I just wonder if people ever put such an explicit list of whitespace characters but mean \s, or maybe safer to just remove this particular optimisation and assume people wrote this list of characters because they intended to? (up to you to decide, of course)

RReverser avatar Jul 03 '17 14:07 RReverser

Yeah, let’s not change runtime semantics.

mathiasbynens avatar Jul 03 '17 14:07 mathiasbynens

OK, need to think about it. Depending on how often the use-case is, we can either exclude the \s transform by default (and allow to opt-in), or, if it's very rare, to keep, and allow to opt-out when needed.

DmitrySoshnikov avatar Jul 04 '17 23:07 DmitrySoshnikov

or, if it's very rare, to keep, and allow to opt-out when needed

I don't think keeping it as-is is a very good idea, especially if it's rarely used, since then it can just bite silently much later in runtime.

RReverser avatar Jul 05 '17 22:07 RReverser