sd
sd copied to clipboard
Bug in replace? Text is removed
I'm using the Windows version 0.65 and this is the command. It's working with many other tools but not with sd.
set pattern="(.*)Call ((List|Text)\d+_Key(Down|Up))\((.*), (vbKeyReturn|13), (.*)\)"
set replace="$1Call $2($5, GetFM20ReturnKey(), $6)"
sd.exe %pattern% %replace% "Test.txt"
Test.txt contains this Text
Call Text3_KeyDown(Index, vbKeyReturn, 0)
Call Text3_KeyDown(Index, vbKeyReturn, 0)
Call Text3_KeyDown(Index, vbKeyReturn, 0)
Call Text3_KeyDown(Index, vbKeyReturn, 0)
It gets replaced with
Text3_KeyDown(Index, GetFM20ReturnKey(), vbKeyReturn)
Text3_KeyDown(Index, GetFM20ReturnKey(), vbKeyReturn)
Text3_KeyDown(Index, GetFM20ReturnKey(), vbKeyReturn)
Text3_KeyDown(Index, GetFM20ReturnKey(), vbKeyReturn)
I'm Missing the first capture group and the Call word. Can you tell me if I'm doing something wrong or if it's a bug. As other tools are working with the same pattern and replace strings, I think it's a bug in sd.
@Tragen since sd
supports more than index-based variable names (see third example), in your case $1Call
was treated as a variable.
Just replace it with ${1}Call
to resolve the ambiguity.
Thanks for the workaround. Still looks like a bug to me. I don't use named capture groups and even then, variables shouldn't start with a number, then you don't have ambiguity.
Ran into same problem, have the same opinion, but I am glad to hear this was at least taken into consideration and not a hidden bug.
The current behavior follows the regex
crate which should generally be our source of truth for these kinds of situations. It's outlined here
https://docs.rs/regex/latest/regex/struct.Regex.html#method.replace
...
The longest possible name is used. e.g.,
$1a
looks up the capture group named1a
and not the capture group at index1
. To exert more precise control over the name, use braces, e.g.,${1}a
....
I do think we should document this caveat more clearly though, so I'll do that tomorrow
Maybe there's a thread in the regex crate to discuss it. Probably hard to change the upstream now.
As someone with a lot of regex experience across Perl, Python, Ruby, Javascript, Oniguruma, Grep, Sed, etc I've never had an issue figuring out small differences like \1 instead of $1 or \k instead of \g, but it did take me quite a while to figure out what I was supposed to do. Mostly because "sometimes" the $1 worked and other times it didn't and the pattern was not obvious.
Documentation would help a bit but honestly, given how common bash/shell is I really doubt any first time user is going to see $1Call
and think that it's one token. I don't know of any language that allows variables to start with a number.
I feel like a warning would be the most useful thing. Meaning, if a replace involves a capture group that doesn't exist, and the group starts with a number, just explain the whole situation of how to refer to numbered capture groups.
Maybe there's a thread in the regex crate to discuss it. Probably hard to change the upstream now.
It dates all the way back to an early regex issue
https://github.com/rust-lang/regex/issues/69
It looks like constructing a named capture starting with an integer is already disallowed by the regex
crate, so I'm currently leaning towards having a hard error if the replacement text has a named capture group that starts with a number. The way to resolve it will still be to disambiguate the group, but this way the user will be immediately aware of what the issue is
This will be a breaking change, but I was already planning on making other breaking changes in the next release, so that's fine. Thanks for staying adamant @jeff-hykin
Glad to hear about it!
with old school tools like sed being basically frozen, I feel like its the job of this next generation of CLI tools to be as intuitive as possible before they also eventually become frozen.
I comment like this: https://github.com/rust-lang/regex/issues/69#issuecomment-1573680606:
I think using braces to invoke the non-just-number variable will be a better idea.
$1asd
=${1}asd
;$asd
= Just an error;${as}
= OK.
Do you think it is a better idea?
Finally got around to handling this. I opted for a hard error when the user passes a capture group name that would be invalid. Here is what the error for the original usage looks like now
This means that we're now just a bit more strict than the other regex-related Rust tools that people are familiar with which I think is perfectly fine
Perfect 👌 thanks!