sd
sd copied to clipboard
Greedy/non-greedy issue
Using latest precompiled binary on Ubuntu 20.04.
Using
cat > f <<EOF
Dummy line
Key: Value
EOF
I'm sumbling upon this weird behaviour:
sd '^([^:]+):' '`$1`:' < f
gives
`Dummy line
Key`: Value
while I expected
Dummy line
`Key`: Value
The desired output is obtained by adding '\n' to the exclusion set:
sd '^([^:\n]+):' '`$1`:' < f
but the \n shouldn't be needed here.
I'm suspecting it's the same bug that other people have reported here and there with various tools written in Rust.
@sergeevabc, if you don't mind me asking, would you please confirm that the problem I'm reporting has the same origin as the others, in your opinion?
If that's the case, that's not reassuring at all, since I'm heavily depending on the reliability of my regex-related tools.
Notably, Ripgrep doesn't seem to be affected:
rg '^([^:]+):' -r '`$1`:' f
2:`Key`: Value
The difference is that ripgrep replaces the regex patterns line by line, whereas sd applies the regex pattern on the entire file. If you take that into account, the regex works as intended. But I guess sd could introduce some option to apply the regex patterns line by line.
As mentioned this issue is occurring because sd allows for multi-line regex and + is greedy. You can fix this specific case by making the repetition ungreedy with +? instead of negating newlines
$ echo 'Dummy line\nKey: Value' | sd '^([^:]+?):' '`$1`:'
Dummy line
`Key`: Value
An option for applying regex patterns line by line sounds reasonable too :+1: