regex icon indicating copy to clipboard operation
regex copied to clipboard

Underscore in replacement string treated as backspace

Open hugoduncan opened this issue 9 years ago • 9 comments

Using a _ in the replacement string of Regex::replace leads to unexpected behaviour. The _ seems to be treated as a backspace. The documentation should either make mention of this, or this seems to be a bug.

#[test]
fn replacement_with_underscore() {
    let re = regex!(r"(.)(.)");
    let s1 = re.replace("ab","$1-$2");
    let s2 = re.replace("ab","$1_$2");
    assert_eq!("a-b", &s1);
    assert_eq!("a_b", &s2); // Fails here "a_b" != "b"
}

hugoduncan avatar Apr 13 '15 02:04 hugoduncan

Interesting. It seems that _ is considered part of the capture group name. Since a group with that name doesn't exist, it is replaced with an empty string.

Definitely a bug. Not entirely sure what the right fix is. What is the simplest thing we can do?

BurntSushi avatar Apr 13 '15 03:04 BurntSushi

What is the simplest thing we can do?

Define that the names of capture groups cannot start with numbers? (i.e. do something similar to Rust identifiers.)

huonw avatar Apr 13 '15 04:04 huonw

This bug seems to still be there.

let re = Regex::new(r"(.)(.)").unwrap();
let s1 = re.replace("ab","$1-$2");
let s2 = re.replace("ab","$1_$2");
assert_eq!("a-b", &s1);
assert_eq!("a_b", &s2); // Fails here "a_b" != "b"

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=fa837d651980121ff786eae319965fbc

ekse avatar Jan 17 '19 21:01 ekse

Ah I see, it looks like I only fixed the first half of this, which was to make capture group names that start with a number illegal. The latter half, which I forgot to do, is to fix expansion such that it treats capture group names starting with a number in the replacement correctly. Thanks for catching this!

BurntSushi avatar Jan 17 '19 23:01 BurntSushi

Since the docs weren't updated either, I'm marking this as something to be fixed in a regex2 some day. Fixing this now would be a breaking change. We'll just have to live with the wart for the time being.

BurntSushi avatar Jun 09 '19 12:06 BurntSushi

Ugh, this is an unfortunate mistake. The "longest possible name" behavior pessimizes replacement strings that use numeric group references, so simple cases like $1asd don't work without {}. There's no benefit to this behavior, as there can be no ambiguity: capture groups starting with numbers have been disallowed now.

It also differs from the behavior of all other regex engines I've used (Java, JavaScript, Python, .NET, IntelliJ IDEA, probably more), so no doubt will catch many people besides me by surprise.

intgr avatar Jun 28 '21 00:06 intgr

I think using braces to invoke the non-just-number variable will be a better idea.

  • $1asd = ${1}asd;
  • $asd = Just an error;
  • ${as} = OK.

PeterlitsZo avatar Jun 02 '23 12:06 PeterlitsZo

@PeterlitsZo The issue here isn't "what should it do instead," but that changing it is a breaking change and I have no plans to do a breaking change release any time soon.

BurntSushi avatar Jun 02 '23 13:06 BurntSushi

@PeterlitsZo The issue here isn't "what should it do instead," but that changing it is a breaking change and I have no plans to do a breaking change release any time soon.

Thanks for your reply. I just mean that MAYBE it is useful in regex2 in the future. I am sorry if I comment on the wrong issue.

PeterlitsZo avatar Jun 02 '23 14:06 PeterlitsZo