Alphabetical order not considering accented character in Crosstab
Birt Version: 4.21.0
Description:
In a report containing a crosstab, one column displays student names that must be sorted in alphabetical order. However, when a name begins with an accented character (e.g., 'Á'), it is incorrectly placed at the end of the list instead of being sorted appropriately based on the character's alphabetical position.
This sorting issue does not occur when using the "Table" field, which handles the ordering correctly. This suggests a potential bug in the crosstab's sorting logic for accented characters.
@yasmimIPM
I'm sure somewhere we would just need to use an appropriate collator. Probably a simple change. But, being the person who does the release engineering, I have no idea clue I should look to address the specific problem you describe.
You might consider to contribute a fix:
https://github.com/eclipse-birt/birt?tab=readme-ov-file#create-a-birt-development-environment
Failing that, and failing someone else jumping in, I would need to know how to reproduce your exact problem, so a simple test case with which I can reproduce the problem locally myself, without access to your data set, would help with that along.
@merks
First, I apologize for the delay in responding.
Due to the sensitive nature of handling children's private information, I'm unable to provide the actual dataset I'm using. However, I've attached the rptdesign.zip file, which includes multiple test reports using both cross-tabs and tables. To use these, simply provide a sample dataset containing names that start with accented characters (e.g., Ágatha or Ágnes), bind it to the report, and ensure the columns match accordingly.
To use these, simply provide a sample dataset containing names that start with accented characters (e.g., Ágatha or Ágnes), bind it to the report, and ensure the columns match accordingly.
Well, can you provide such a dataset yourself? This is work you should not expect from someone else who is helping you unpaid.
@hvbtup I'm sorry for my lack of initiative.
To help demonstrate the issue, I'm attaching a much simpler example.zip file with a table and a cross tab, along with a mock dataset you can use to reproduce the error.
In this test case, you can see the table is able to sort the names, considering the accented charecters, whereas the cross tab fails to do so.
I see. I tested this with Germany instead of Portugal and can confirm this issue. Looks like a bug to me. In particular, the locale settings seem to be completely ignored. Seems like the sorting is strictly by unicode code points. The sorting itself (of the cross tab) is NOT ignored. I tested this by requesting reverse order.
I had to change the URI to be absolute for this to work. The problem is that here it's using String.compare which does a compare based on unicode character order, not using a collator.
Thread [Worker-16: Rendering report] (Suspended (breakpoint at line 2043 in String))
String.compareTo(String) line: 2043
BaseScriptEvalUtil.compare(Object, Object, BaseCompareHints) line: 310
BaseScriptEvalUtil.compare(Object, Object) line: 461
CompareUtil.compare(Object, Object) line: 51
CompareUtil.compare(Object[], Object[], boolean[]) line: 41
CompareUtil.compare(Object[], Object[]) line: 32
DimensionRow.compareTo(Object) line: 88
BaseDiskSortedStack$1.compare(Object, Object) line: 110
3 collapsed frames
TimSort<T>.binarySort(T[], int, int, int, Comparator<? super T>) line: 296
TimSort<T>.sort(T[], int, int, Comparator<? super T>, T[], int, int) line: 221
Arrays.sort(T[], int, int, Comparator<? super T>) line: 1308
DiskSortedStack(BaseDiskSortedStack).sort(Object[], int, int) line: 205
DiskSortedStack(BaseDiskSortedStack).initPop() line: 297
DiskSortedStack(BaseDiskSortedStack).pop() line: 242
Hierarchy.saveHierarchyRows(ILevelDefn[], int[][], int[][], DiskSortedStack, StopSign) line: 301
Hierarchy.createAndSaveHierarchy(IDatasetIterator, ILevelDefn[], StopSign) line: 142
CubeMaterializer.createHierarchy(String, String, IDatasetIterator, ILevelDefn[], StopSign) line: 121
DataRequestSessionImpl.populateDimension(CubeMaterializer, DimensionHandle, TabularCubeHandle, Map, SecurityListener) line: 1309
DataRequestSessionImpl.populateDimensions(CubeMaterializer, TabularCubeHandle, Map, SecurityListener) line: 1210
DataRequestSessionImpl.createCube(TabularCubeHandle, CubeMaterializer, Map) line: 797
DataRequestSessionImpl.materializeCube(CubeHandle, Map) line: 715
DataRequestSessionImpl.execute(IBasePreparedQuery, IBaseQueryResults, ScriptContext) line: 598
DataGenerationEngine(DteDataEngine).doExecuteCube(IBaseResultSet, ICubeQueryDefinition, Object, boolean) line: 194
DataGenerationEngine.doExecuteCube(IBaseResultSet, ICubeQueryDefinition, Object, boolean) line: 85
DataGenerationEngine(AbstractDataEngine).execute(IBaseResultSet, IDataQueryDefinition, Object, boolean) line: 256
ExecutorManager$ExecutorContext.executeQuery(IBaseResultSet, IDataQueryDefinition, Object) line: 405
CrosstabReportItemExecutor(BaseCrosstabExecutor).executeQuery(AbstractCrosstabItemHandle) line: 116
CrosstabReportItemExecutor.execute() line: 96
ExtendedItemExecutor.execute() line: 59
ReportItemEmitterExecutor(WrappedReportItemExecutor).execute() line: 45
ReportItemEmitterExecutor.execute() line: 45
SuppressDuplicateItemExecutor.execute() line: 41
LocalizedReportItemExecutor(WrappedReportItemExecutor).execute() line: 45
LocalizedReportItemExecutor.execute() line: 34
HTMLPageLM(HTMLBlockStackingLM).layoutNodes() line: 62
HTMLPageLM.layout() line: 92
HTMLReportLayoutEngine.layout(IReportExecutor, IReportContent, IContentEmitter, boolean) line: 97
ReportDocumentBuilder.build() line: 226
RunTask.doRun() line: 224
RunTask.run(IDocArchiveWriter) line: 108
StaticHTMLViewer(AbstractViewer).createReportDocument(String, String, Map) line: 90
StaticHTMLViewer.renderReport(IProgressMonitor) line: 666
StaticHTMLViewer$16.work(IProgressMonitor) line: 761
StaticHTMLViewer$16(AbstractJob).run(IProgressMonitor) line: 48
1 collapsed frames
So it's doing this comparison, for example:
I don't really know how best to fix such a problem. Maybe better here it uses some "default collator/comparator" instead just the plain old string compare which is almost always going to produce undesirable results except for en_US:
E.g.,
That produces a better order:
Opinions anyone? (Hopefully that doesn't break some test expectations.)
Just want to say that I know this kind of problem from other software as well (e.g. from Oracle, at least with older releases). I don't know how to fix this. As long as the bug persists, it may help to display an information message before or after the cross tab that explains the ordering.
I don't know how to fix this.
This fixes the problem using com.ibm.icu.text.Collator:
The question is, are folks comfortable with that fix? It seems to me using the collator for the locale is always better than using unicode order, which I think doesn't even get upper/lower case correct.
Well, this is a change in behavior. In theory, this might be a breaking change, because the output sorting sequence has changed.
But for me, I would consider this a bug fix. And if someone absolutely needs the old (from my POV: wrong) behavior, they can easily add the locale settings for the sorting of the cross tab.
So I'm +1 for fixing this.
But Collator.getInstance() doesn't seem right.
On your machine, and on my machine, and on the OP's machine, the result might be right, but only because the locale settings match our personal expectations and because
Wouldn't we need to supply the settings from the cross tab to the constructor or sth like that?
I'm not an expert but I think just using getInstance() returns a default collator based on system properties or env variables - which usually, but not always, match the settings explicitly specified in the report.
That method is really widely used already. 🤪
I can create a PR and see how it goes with the tests...
This is somewhat concerning, but obviously no one really complained about it in 15 years. Using the default Collator is still much better than using no Collator at all. So I'm still +1 for your change.
The fix is available here:
https://download.eclipse.org/birt/updates/nightly/latest/