regexp-make-js
regexp-make-js copied to clipboard
How are backreferences adjusted?
When regex instances are interpolated in blocks, the comment mentions "With back-references adjusted". What does that mean?
The tests don't really help me to understand this:
RegExp.make `^(#+)([^#\r\n]*)${ /\1/ }` == /^(#+)([^#\r\n]*)(?:\1)/
RegExp.make `(fo(o))${ /(x)\1(?:\2)/ }bar${ /\1/ }(baz)` == /(fo(o))(?:(x)\3(?:\2))bar(?:\1)(baz)/
RegExp.make `^(${ /(.*)/ }\n(#+)\n${ /(.*)/ }\n\2)\n` == /^((?:(.*))\n(#+)\n(?:(.*))\n\3)\n/
RegExp.make `${ /\1/ }` == /(?:(?:))/
First of all, I don't understand what /\1/ is. If I read the spec (ES6, ES5) right, then this should throw a SyntaxError, as there are not enough NcapturingParens in the regex. If I test it in my browser (old Opera, FF), this is a valid expression however, which happily matches "\1" (yes, that's String.fromCharCode(1)).
Neither of these behaviours is reflected in the tests, though. Instead, they do expect
- "Back-reference not scoped to containing RegExp" but instead referencing a group in the result regexp
- "un-bindable back-reference" to be rewritten to a simple consume-nothing
(?:)
which imo both collide with the goal that
RegExpinstances are treated like the set of substrings they match
The rewriting of backreferences (both from the template, when "interrupted", and from the interpolation value, to reference the same group as before) seem to reasonable in contrast.
Ah, I just came across this by chance: They are octal escape sequences, just like the ones in string literals. \0 to \7, \00 to \77, and \000 to \377 make single characters, not backreferences. A thing like /\9/ does however fail to match any input.
Are you satisfied with the handling of back-references?
Would more tests or changes to documentation help others avoid your intitial confusion?
What are your thoughts about the semantic gap between
/\1/.exec('\u0001')
and
RegExp.make`()${/\1/}`.exec('\u0001')
Closing. I don't think there's a point of disagreement or change requested here.
OK, I'd like to request a change for the tests to match the draft goals.
Or otherwise get an explanation how RegExp.make did behave in the current tests.
RegExp.make `^(#+)([^#\r\n]*)${ /\1/ }` /* should imo
become */ /^(#+)([^#\r\n]*)(?:\x01)/ /*
not */ /^(#+)([^#\r\n]*)(?:\1)/
RegExp.make `(fo(o))${ /(x)\1(?:\2)/ }bar${ /\1/ }(baz)` /* should imo
become */ /(fo(o))(?:(x)\3(?:\x02))bar(?:\x01)(baz)/ /*
not */ /(fo(o))(?:(x)\3(?:\2))bar(?:\1)(baz)/
RegExp.make `${ /\1/ }` /* should imo
become */ /(?:\x01)/ /*
not */ /(?:(?:))/
In short: capturing groups and backreferences should only refer to each other within the same regex or template, and not clash with interpolation.
"unbindable backreferences" should either lead to a SyntaxError or be treated like an octal escape.
Fair enough. I'll not it in the doc. I think I probably agree with you, but there's no other way of specifying capturing groups right now in strict mode code since
`\1`
is an octal escape.
Oh, I didn't realize this was done because RegExp.make (1) \1`` doesn't work. Maybe we'd need to relax the template string syntax to allow this, and throw only when the string values are accessed (but now when only .raw is used)?
I'd guess that /\1/ should actually throw as well in strict mode instead of being interpreted as an octal escape.
That wasn't the original reason I did it. I didn't read the spec closely enough to realize that /\1/ is equivalent to /\x01/