coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

Discrepancy in output length with special characters

Open sylvestre opened this issue 1 year ago • 7 comments

With

echo "mÃ" > /tmp/a

With our implementation:

$ cat /tmp/a| ./target/debug/coreutils wc  -L -l
      1      2

With GNU:

$ cat /tmp/a| /usr/bin/wc -L -l  
      1      1

Found with the wc differential fuzzer

sylvestre avatar Jan 13 '24 12:01 sylvestre

I can't reproduce this issue locally. GNU wc returns the same result as uutils wc:

$ echo "mÃ" > /tmp/a
$ cat /tmp/a | /usr/bin/wc -L -l
      1       2

cakebaker avatar Jan 13 '24 13:01 cakebaker

try with a different local ? like with LANG=C ?

sylvestre avatar Jan 13 '24 13:01 sylvestre

I tried LC_ALL=C, LANG=C and LC_COLLATE=C, but no difference. And based on the doc of -L I would expect it to return 2 (and not 1) because there are two characters.

cakebaker avatar Jan 13 '24 13:01 cakebaker

@sylvestre Do you still have the file somewhere? I suspect that this is just an encoding issue, because I can't reproduce it either. For reference, here's my /tmp/a:

$ hexdump -C /tmp/a
00000000  6d c3 83 0a                                       |m...|
00000004

BenWiederhake avatar Mar 02 '24 12:03 BenWiederhake