sd icon indicating copy to clipboard operation
sd copied to clipboard

Greedy/non-greedy issue

Open ngirard opened this issue 4 years ago • 2 comments

Using latest precompiled binary on Ubuntu 20.04.

Using

cat > f <<EOF
Dummy line
Key: Value
EOF

I'm sumbling upon this weird behaviour: sd '^([^:]+):' '`$1`:' < f gives

`Dummy line
Key`: Value

while I expected

Dummy line
`Key`: Value

The desired output is obtained by adding '\n' to the exclusion set: sd '^([^:\n]+):' '`$1`:' < f

but the \n shouldn't be needed here. I'm suspecting it's the same bug that other people have reported here and there with various tools written in Rust.

@sergeevabc, if you don't mind me asking, would you please confirm that the problem I'm reporting has the same origin as the others, in your opinion?

If that's the case, that's not reassuring at all, since I'm heavily depending on the reliability of my regex-related tools.

Notably, Ripgrep doesn't seem to be affected:

rg '^([^:]+):' -r '`$1`:' f
2:`Key`: Value

ngirard avatar Apr 29 '21 19:04 ngirard

The difference is that ripgrep replaces the regex patterns line by line, whereas sd applies the regex pattern on the entire file. If you take that into account, the regex works as intended. But I guess sd could introduce some option to apply the regex patterns line by line.

Linus789 avatar May 05 '21 21:05 Linus789

As mentioned this issue is occurring because sd allows for multi-line regex and + is greedy. You can fix this specific case by making the repetition ungreedy with +? instead of negating newlines

$ echo 'Dummy line\nKey: Value' | sd '^([^:]+?):' '`$1`:'
Dummy line
`Key`: Value

An option for applying regex patterns line by line sounds reasonable too :+1:

CosmicHorrorDev avatar May 11 '23 18:05 CosmicHorrorDev