sd icon indicating copy to clipboard operation
sd copied to clipboard

NUL is not handled correctly

Open NightMachinery opened this issue 5 years ago • 3 comments

I am trying to use sd to remove some values from a NUL-separated file, but it does nothing to the file.

# zsh
sd --flags m --string-mode $'\0'"$i" '' "$attic"
# cat -v $attic
^@this is 1.
this is 2.
this is 3.
^@hi
^@blue boy
# cat -v <<<"$i"
this is 1.
this is 2.
this is 3.

NightMachinery avatar Aug 07 '19 08:08 NightMachinery

It is not possible to pass a NUL byte in an argument to a command that is executed (same for environment variables) as the execve() system call takes a list of NUL delimited strings.

Passing a $'\0' in zsh only works for commands that are not executed like builtins or functions.

Compare:

$ printf '%q\n' $'xx\0yy'
xx$'\0'yy
$ /usr/bin/printf '%q\n' $'xx\0yy'
xx

Everything past the NUL was discarded by execve() for the non-builtin printf

That's a limitation of the kernel API, not of sd nor zsh.

stephane-chazelas avatar Aug 07 '19 11:08 stephane-chazelas

@stephane-chazelas Oh. I certainly didn't know that. But there should still be a way to support NUL in sd, like perl does. Some escape sequence or sth.

NightMachinery avatar Aug 08 '19 10:08 NightMachinery

@stephane-chazelas Oh. I certainly didn't know that. But there should still be a way to support NUL in sd, like perl does. Some escape sequence or sth.

Yes, rust regex seem to support \x00, but it doesn't support the \Q\E of perl/pcre so you won't be able to do things like

sd "\x00\Q$i\E" ""

(which would still not help if $i contains \E btw) like you would

perl -0777 -pe 's/\0\Q$ENV{i}\E//g'

In the limited testing I did yesterday, I found rust/sd regex pretty limited compared to the ones I'm used to in perl/python (no look around, no \h, no back-reference).

stephane-chazelas avatar Aug 08 '19 10:08 stephane-chazelas

Going to close this now since \x00 seems to be the appropriate way to handle this. I'm considering adding a flag to use PCRE2 as the regex engine which should allow for more functionality

I found rust/sd regex pretty limited compared to the ones I'm used to in perl/python (no look around, no \h, no back-reference).

This was an intentional design decision of the regex crate to keep the time complexity for matching to O(len_of_regex * len_of_text) which helps to avoid things like ReDoS attacks

CosmicHorrorDev avatar May 11 '23 07:05 CosmicHorrorDev