coreutils
coreutils copied to clipboard
Discrepancy in output length with special characters
With
echo "mÃ" > /tmp/a
With our implementation:
$ cat /tmp/a| ./target/debug/coreutils wc -L -l
1 2
With GNU:
$ cat /tmp/a| /usr/bin/wc -L -l
1 1
Found with the wc differential fuzzer
I can't reproduce this issue locally. GNU wc returns the same result as uutils wc:
$ echo "mÃ" > /tmp/a
$ cat /tmp/a | /usr/bin/wc -L -l
1 2
try with a different local ? like with LANG=C ?
I tried LC_ALL=C, LANG=C and LC_COLLATE=C, but no difference. And based on the doc of -L I would expect it to return 2 (and not 1) because there are two characters.
@sylvestre Do you still have the file somewhere? I suspect that this is just an encoding issue, because I can't reproduce it either. For reference, here's my /tmp/a:
$ hexdump -C /tmp/a
00000000 6d c3 83 0a |m...|
00000004