Inconsistent comparison of Korean syllables vs individual jamo
During some experimentation, I did
let locale = locale!("ko").into();
let mut options = CollatorOptions::default();
options.strength = Some(Strength::Primary);
let collator = Collator::try_new(locale, options).unwrap();
println!("이 - {:?}", collator.compare("이", "ㅇㅣ"));
println!("일 - {:?}", collator.compare("일", "ㅇㅣㄹ"));
println!("읽 - {:?}", collator.compare("읽", "ㅇㅣㄹㄱ"));
The answers were: equal, greater, greater.
I don't know what is intended, whether the strings should compare equal or not, whether precomposed syllables should be separated from jamo or not, but I am surprised that those three answers are not all the same, at least at primary strength.
It seems that syllables compare as equal to the individual jamo if there are no batchim (bottom consonants), but not if there are any batchim.
Hangul syllables are supposed to compare equal with the corresponding conjoining jamo, and the individual jamo here aren't conjoining jamo.
I agree that it's suprising that non-conjoining and conjoining jamo apparently aren't primary-equal, and I don't know enough of the background to say why or how intentional that is.
The UCA spec discusses multiple methods of handling conjoining jamo, and I'm not sure which one ICU4C and ICU4X use.