`expr` is failing with multibyte chars
It causes https://github.com/coreutils/coreutils/blob/master/tests/misc/expr-multibyte.pl to fail
$ ./target/debug/coreutils expr length αbcdef
7
GNU:
$ expr length αbcdef
6
needs to have a different locale compiled like
sudo locale-gen fr_FR.UTF-8
Of course, it is about rust. See https://doc.rust-lang.org/book/ch08-02-strings.html#internal-representation
Simple testcase:
fn main() {
let s = String::from("αbcdef");
assert_eq!(s.len(), 6);
}
=>
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `7`,
right: `6`', src/main.rs:3:5
I did some extra testing to check whether we need unicode segmentation here and we don't. GNU expr outputs a length of 2 for this emoji:
[src/main.rs:4] "🇳🇱".len() = 8
[src/main.rs:5] "🇳🇱".chars().count() = 2
[src/main.rs:6] UnicodeSegmentation::graphemes("🇳🇱", true).count() = 1
Yeah, I am working on a fix :)
To reproduce: bash util/run-gnu-test.sh tests/misc/expr-multibyte
Actually, my patch was wrong, it should take in account the locale
$ LANG=C expr length αbcdef
7
$ LANG=fr_FR.UTF-8 expr length αbcdef
6
seems that we should use MB_CUR_MAX to see the number of bytes
Hi all, is this still an issue?
Chris
Closing this issue, it looks like the issue has been fixed in the meantime and the GNU test tests/expr/expr-multibyte.pl passes.