regexp-make-js icon indicating copy to clipboard operation
regexp-make-js copied to clipboard

How are backreferences adjusted?

Open bergus opened this issue 10 years ago • 7 comments

When regex instances are interpolated in blocks, the comment mentions "With back-references adjusted". What does that mean?

The tests don't really help me to understand this:

RegExp.make `^(#+)([^#\r\n]*)${ /\1/ }`                  == /^(#+)([^#\r\n]*)(?:\1)/
RegExp.make `(fo(o))${ /(x)\1(?:\2)/ }bar${ /\1/ }(baz)` == /(fo(o))(?:(x)\3(?:\2))bar(?:\1)(baz)/
RegExp.make `^(${ /(.*)/ }\n(#+)\n${ /(.*)/ }\n\2)\n`    == /^((?:(.*))\n(#+)\n(?:(.*))\n\3)\n/
RegExp.make `${ /\1/ }`                                  == /(?:(?:))/

First of all, I don't understand what /\1/ is. If I read the spec (ES6, ES5) right, then this should throw a SyntaxError, as there are not enough NcapturingParens in the regex. If I test it in my browser (old Opera, FF), this is a valid expression however, which happily matches "\1" (yes, that's String.fromCharCode(1)).
Neither of these behaviours is reflected in the tests, though. Instead, they do expect

  • "Back-reference not scoped to containing RegExp" but instead referencing a group in the result regexp
  • "un-bindable back-reference" to be rewritten to a simple consume-nothing (?:)

which imo both collide with the goal that

RegExp instances are treated like the set of substrings they match

The rewriting of backreferences (both from the template, when "interrupted", and from the interpolation value, to reference the same group as before) seem to reasonable in contrast.

bergus avatar Aug 12 '15 05:08 bergus

Ah, I just came across this by chance: They are octal escape sequences, just like the ones in string literals. \0 to \7, \00 to \77, and \000 to \377 make single characters, not backreferences. A thing like /\9/ does however fail to match any input.

bergus avatar Aug 12 '15 22:08 bergus

Are you satisfied with the handling of back-references?

Would more tests or changes to documentation help others avoid your intitial confusion?

What are your thoughts about the semantic gap between

/\1/.exec('\u0001')

and

RegExp.make`()${/\1/}`.exec('\u0001')

mikesamuel avatar Sep 03 '15 14:09 mikesamuel

Closing. I don't think there's a point of disagreement or change requested here.

mikesamuel avatar Oct 13 '15 18:10 mikesamuel

OK, I'd like to request a change for the tests to match the draft goals. Or otherwise get an explanation how RegExp.make did behave in the current tests.

RegExp.make `^(#+)([^#\r\n]*)${ /\1/ }` /* should imo
 become  */ /^(#+)([^#\r\n]*)(?:\x01)/ /*
 not     */ /^(#+)([^#\r\n]*)(?:\1)/

RegExp.make `(fo(o))${ /(x)\1(?:\2)/ }bar${ /\1/ }(baz)` /* should imo
 become  */ /(fo(o))(?:(x)\3(?:\x02))bar(?:\x01)(baz)/ /*
 not     */ /(fo(o))(?:(x)\3(?:\2))bar(?:\1)(baz)/

RegExp.make `${ /\1/ }` /* should imo
 become  */ /(?:\x01)/ /*
 not     */ /(?:(?:))/

In short: capturing groups and backreferences should only refer to each other within the same regex or template, and not clash with interpolation. "unbindable backreferences" should either lead to a SyntaxError or be treated like an octal escape.

bergus avatar Oct 13 '15 18:10 bergus

Fair enough. I'll not it in the doc. I think I probably agree with you, but there's no other way of specifying capturing groups right now in strict mode code since

`\1`

is an octal escape.

mikesamuel avatar Oct 14 '15 14:10 mikesamuel

Oh, I didn't realize this was done because RegExp.make (1) \1`` doesn't work. Maybe we'd need to relax the template string syntax to allow this, and throw only when the string values are accessed (but now when only .raw is used)?

I'd guess that /\1/ should actually throw as well in strict mode instead of being interpreted as an octal escape.

bergus avatar Oct 14 '15 15:10 bergus

That wasn't the original reason I did it. I didn't read the spec closely enough to realize that /\1/ is equivalent to /\x01/

mikesamuel avatar Oct 14 '15 15:10 mikesamuel