birt icon indicating copy to clipboard operation
birt copied to clipboard

Use com.ibm.icu.text.Collator instead of String.compare by default

Open merks opened this issue 2 weeks ago • 4 comments

https://github.com/eclipse-birt/birt/issues/2329

merks avatar Dec 11 '25 08:12 merks

I think this is a good change but I want to be sure that you both agree this this is a good change.

merks avatar Dec 11 '25 09:12 merks

I don't expect a collator to treat two different characters as equals and in fact that isn't the case:

image

Anyway, my time is limited and I don't have domain knowledge. I simple see results that scream wrong to me and this is where that comes from.

merks avatar Dec 11 '25 11:12 merks

I don't know if the problem I described actually exists. Certainly not on our machines which are configured as German or English. But who knows which Collator configuration getInstance() returns on other machines?

Well, at least there probably is a way to explicitly specify the locale (and collator) when starting the JVM. I always forget how this works in detail, however. I'm not an expert for ICU4J. Given that this option exists, it should be possible to effectively the previous behavior using some configuration option.

Maybe @wimjongman or @speckyspooky or someone knows more.

I still think a cleaner solution would be to use the same approach for sorting as the table item does.

hvbtup avatar Dec 11 '25 14:12 hvbtup

I don't see how to "inject" a cleaner solution into this call stack. I had a hard enough time tracking this down and finally resorted to a conditional breakpoint on String.compare to find it, assuming that was being called. I don't know which collator is being for the regular table.

In the end, I just see something that works poorly for all languages, always, everywhere. Even the simplest collator will help with things like keeping 'A' and 'a' together in sorted order. I don't disagree with your concerns, I just don't have knowledge of the big picture, and using String.compare as a default just screams out as a bad idea.

Anyway, I have done my best here with my limited time and knowledge. If someone wants to fix this is a better way, that will need to be done by the someone who wants that. No insult to criticism intended!

merks avatar Dec 11 '25 14:12 merks

I read your discussion and the examples. I have also reviewed the ICU4J-Collator.

My recommendation would be to try it with ICU4J. The locale from the JVM start -nl will be exchanged to the ICU4J and works with it. If there any issue with the local the default "en_EN" will be used (debug result from the zh_TW topic).

@merks Only one question which cames up due to my ICU4J investigation. The documentation confused me because the methode compare(Object obj1, Object ob2) use internally doCompare((CharSequence)source, (CharSequence)target). But doCompare is marked as deprecated. Would it be better to use the string bases method of the collator compare(String obj1, String ob2)...?

Screen from ICU4J: image

speckyspooky avatar Dec 17 '25 15:12 speckyspooky

@speckyspooky

Calling doCompare which casts each argument to CharSeuqnce and then calls toString is kind of strange. We do know both arguments are String so can cast to String which appears would be a bit more efficient, so I changed that.

@hvbtup

You're review is still a blocking review...

merks avatar Dec 18 '25 06:12 merks

I still believe a clean solution would be to perform comparions in the same way the Table item does. And I still believe it is risky to change such a central method. But I don't find the time to investigate further, and Thomas knows the details bettter.

How about commiting this PR, and at the same time document in the release note that this is a breaking change?

hvbtup avatar Dec 18 '25 07:12 hvbtup