sd icon indicating copy to clipboard operation
sd copied to clipboard

Bug in replace? Text is removed

Open Tragen opened this issue 5 years ago • 2 comments

I'm using the Windows version 0.65 and this is the command. It's working with many other tools but not with sd.

set pattern="(.*)Call ((List|Text)\d+_Key(Down|Up))\((.*), (vbKeyReturn|13), (.*)\)"
set replace="$1Call $2($5, GetFM20ReturnKey(), $6)"
sd.exe %pattern% %replace% "Test.txt"

Test.txt contains this Text

Call Text3_KeyDown(Index, vbKeyReturn, 0)
    Call Text3_KeyDown(Index, vbKeyReturn, 0)
        Call Text3_KeyDown(Index, vbKeyReturn, 0)
Call Text3_KeyDown(Index, vbKeyReturn, 0)

It gets replaced with

 Text3_KeyDown(Index, GetFM20ReturnKey(), vbKeyReturn)
 Text3_KeyDown(Index, GetFM20ReturnKey(), vbKeyReturn)
 Text3_KeyDown(Index, GetFM20ReturnKey(), vbKeyReturn)
 Text3_KeyDown(Index, GetFM20ReturnKey(), vbKeyReturn)

I'm Missing the first capture group and the Call word. Can you tell me if I'm doing something wrong or if it's a bug. As other tools are working with the same pattern and replace strings, I think it's a bug in sd.

Tragen avatar Jun 27 '19 00:06 Tragen

@Tragen since sd supports more than index-based variable names (see third example), in your case $1Call was treated as a variable.

Just replace it with ${1}Call to resolve the ambiguity.

chmln avatar Jun 28 '19 20:06 chmln

Thanks for the workaround. Still looks like a bug to me. I don't use named capture groups and even then, variables shouldn't start with a number, then you don't have ambiguity.

Tragen avatar Jun 29 '19 20:06 Tragen

Ran into same problem, have the same opinion, but I am glad to hear this was at least taken into consideration and not a hidden bug.

jeff-hykin avatar Jan 30 '23 23:01 jeff-hykin

The current behavior follows the regex crate which should generally be our source of truth for these kinds of situations. It's outlined here

https://docs.rs/regex/latest/regex/struct.Regex.html#method.replace

...

The longest possible name is used. e.g., $1a looks up the capture group named 1a and not the capture group at index 1. To exert more precise control over the name, use braces, e.g., ${1}a.

...

I do think we should document this caveat more clearly though, so I'll do that tomorrow

CosmicHorrorDev avatar May 11 '23 07:05 CosmicHorrorDev

Maybe there's a thread in the regex crate to discuss it. Probably hard to change the upstream now.

As someone with a lot of regex experience across Perl, Python, Ruby, Javascript, Oniguruma, Grep, Sed, etc I've never had an issue figuring out small differences like \1 instead of $1 or \k instead of \g, but it did take me quite a while to figure out what I was supposed to do. Mostly because "sometimes" the $1 worked and other times it didn't and the pattern was not obvious.

Documentation would help a bit but honestly, given how common bash/shell is I really doubt any first time user is going to see $1Call and think that it's one token. I don't know of any language that allows variables to start with a number.

I feel like a warning would be the most useful thing. Meaning, if a replace involves a capture group that doesn't exist, and the group starts with a number, just explain the whole situation of how to refer to numbered capture groups.

jeff-hykin avatar May 11 '23 16:05 jeff-hykin

Maybe there's a thread in the regex crate to discuss it. Probably hard to change the upstream now.

It dates all the way back to an early regex issue

https://github.com/rust-lang/regex/issues/69

It looks like constructing a named capture starting with an integer is already disallowed by the regex crate, so I'm currently leaning towards having a hard error if the replacement text has a named capture group that starts with a number. The way to resolve it will still be to disambiguate the group, but this way the user will be immediately aware of what the issue is

This will be a breaking change, but I was already planning on making other breaking changes in the next release, so that's fine. Thanks for staying adamant @jeff-hykin

CosmicHorrorDev avatar May 11 '23 18:05 CosmicHorrorDev

Glad to hear about it!

with old school tools like sed being basically frozen, I feel like its the job of this next generation of CLI tools to be as intuitive as possible before they also eventually become frozen.

jeff-hykin avatar May 16 '23 02:05 jeff-hykin

I comment like this: https://github.com/rust-lang/regex/issues/69#issuecomment-1573680606:

I think using braces to invoke the non-just-number variable will be a better idea.

  • $1asd = ${1}asd;
  • $asd = Just an error;
  • ${as} = OK.

Do you think it is a better idea?

PeterlitsZo avatar Jun 02 '23 12:06 PeterlitsZo

Finally got around to handling this. I opted for a hard error when the user passes a capture group name that would be invalid. Here is what the error for the original usage looks like now

image

This means that we're now just a bit more strict than the other regex-related Rust tools that people are familiar with which I think is perfectly fine

CosmicHorrorDev avatar Oct 28 '23 20:10 CosmicHorrorDev

Perfect 👌 thanks!

jeff-hykin avatar Oct 30 '23 12:10 jeff-hykin