birt icon indicating copy to clipboard operation
birt copied to clipboard

Alphabetical order not considering accented character in Crosstab

Open yasmimIPM opened this issue 1 month ago • 11 comments

Birt Version: 4.21.0

Description:

In a report containing a crosstab, one column displays student names that must be sorted in alphabetical order. However, when a name begins with an accented character (e.g., 'Á'), it is incorrectly placed at the end of the list instead of being sorted appropriately based on the character's alphabetical position.

This sorting issue does not occur when using the "Table" field, which handles the ordering correctly. This suggests a potential bug in the crosstab's sorting logic for accented characters.

yasmimIPM avatar Nov 25 '25 17:11 yasmimIPM

@yasmimIPM

I'm sure somewhere we would just need to use an appropriate collator. Probably a simple change. But, being the person who does the release engineering, I have no idea clue I should look to address the specific problem you describe.

You might consider to contribute a fix:

https://github.com/eclipse-birt/birt?tab=readme-ov-file#create-a-birt-development-environment

Failing that, and failing someone else jumping in, I would need to know how to reproduce your exact problem, so a simple test case with which I can reproduce the problem locally myself, without access to your data set, would help with that along.

merks avatar Dec 03 '25 07:12 merks

@merks

First, I apologize for the delay in responding.

Due to the sensitive nature of handling children's private information, I'm unable to provide the actual dataset I'm using. However, I've attached the rptdesign.zip file, which includes multiple test reports using both cross-tabs and tables. To use these, simply provide a sample dataset containing names that start with accented characters (e.g., Ágatha or Ágnes), bind it to the report, and ensure the columns match accordingly.

yasmimIPM avatar Dec 08 '25 12:12 yasmimIPM

To use these, simply provide a sample dataset containing names that start with accented characters (e.g., Ágatha or Ágnes), bind it to the report, and ensure the columns match accordingly.

Well, can you provide such a dataset yourself? This is work you should not expect from someone else who is helping you unpaid.

hvbtup avatar Dec 10 '25 17:12 hvbtup

@hvbtup I'm sorry for my lack of initiative.

To help demonstrate the issue, I'm attaching a much simpler example.zip file with a table and a cross tab, along with a mock dataset you can use to reproduce the error.

In this test case, you can see the table is able to sort the names, considering the accented charecters, whereas the cross tab fails to do so.

yasmimIPM avatar Dec 10 '25 17:12 yasmimIPM

I see. I tested this with Germany instead of Portugal and can confirm this issue. Looks like a bug to me. In particular, the locale settings seem to be completely ignored. Seems like the sorting is strictly by unicode code points. The sorting itself (of the cross tab) is NOT ignored. I tested this by requesting reverse order.

hvbtup avatar Dec 11 '25 07:12 hvbtup

I had to change the URI to be absolute for this to work. The problem is that here it's using String.compare which does a compare based on unicode character order, not using a collator.

Thread [Worker-16: Rendering report] (Suspended (breakpoint at line 2043 in String))	
	String.compareTo(String) line: 2043	
	BaseScriptEvalUtil.compare(Object, Object, BaseCompareHints) line: 310	
	BaseScriptEvalUtil.compare(Object, Object) line: 461	
	CompareUtil.compare(Object, Object) line: 51	
	CompareUtil.compare(Object[], Object[], boolean[]) line: 41	
	CompareUtil.compare(Object[], Object[]) line: 32	
	DimensionRow.compareTo(Object) line: 88	
	BaseDiskSortedStack$1.compare(Object, Object) line: 110	
	3 collapsed frames	
		TimSort<T>.binarySort(T[], int, int, int, Comparator<? super T>) line: 296	
		TimSort<T>.sort(T[], int, int, Comparator<? super T>, T[], int, int) line: 221	
		Arrays.sort(T[], int, int, Comparator<? super T>) line: 1308	
	DiskSortedStack(BaseDiskSortedStack).sort(Object[], int, int) line: 205	
	DiskSortedStack(BaseDiskSortedStack).initPop() line: 297	
	DiskSortedStack(BaseDiskSortedStack).pop() line: 242	
	Hierarchy.saveHierarchyRows(ILevelDefn[], int[][], int[][], DiskSortedStack, StopSign) line: 301	
	Hierarchy.createAndSaveHierarchy(IDatasetIterator, ILevelDefn[], StopSign) line: 142	
	CubeMaterializer.createHierarchy(String, String, IDatasetIterator, ILevelDefn[], StopSign) line: 121	
	DataRequestSessionImpl.populateDimension(CubeMaterializer, DimensionHandle, TabularCubeHandle, Map, SecurityListener) line: 1309	
	DataRequestSessionImpl.populateDimensions(CubeMaterializer, TabularCubeHandle, Map, SecurityListener) line: 1210	
	DataRequestSessionImpl.createCube(TabularCubeHandle, CubeMaterializer, Map) line: 797	
	DataRequestSessionImpl.materializeCube(CubeHandle, Map) line: 715	
	DataRequestSessionImpl.execute(IBasePreparedQuery, IBaseQueryResults, ScriptContext) line: 598	
	DataGenerationEngine(DteDataEngine).doExecuteCube(IBaseResultSet, ICubeQueryDefinition, Object, boolean) line: 194	
	DataGenerationEngine.doExecuteCube(IBaseResultSet, ICubeQueryDefinition, Object, boolean) line: 85	
	DataGenerationEngine(AbstractDataEngine).execute(IBaseResultSet, IDataQueryDefinition, Object, boolean) line: 256	
	ExecutorManager$ExecutorContext.executeQuery(IBaseResultSet, IDataQueryDefinition, Object) line: 405	
	CrosstabReportItemExecutor(BaseCrosstabExecutor).executeQuery(AbstractCrosstabItemHandle) line: 116	
	CrosstabReportItemExecutor.execute() line: 96	
	ExtendedItemExecutor.execute() line: 59	
	ReportItemEmitterExecutor(WrappedReportItemExecutor).execute() line: 45	
	ReportItemEmitterExecutor.execute() line: 45	
	SuppressDuplicateItemExecutor.execute() line: 41	
	LocalizedReportItemExecutor(WrappedReportItemExecutor).execute() line: 45	
	LocalizedReportItemExecutor.execute() line: 34	
	HTMLPageLM(HTMLBlockStackingLM).layoutNodes() line: 62	
	HTMLPageLM.layout() line: 92	
	HTMLReportLayoutEngine.layout(IReportExecutor, IReportContent, IContentEmitter, boolean) line: 97	
	ReportDocumentBuilder.build() line: 226	
	RunTask.doRun() line: 224	
	RunTask.run(IDocArchiveWriter) line: 108	
	StaticHTMLViewer(AbstractViewer).createReportDocument(String, String, Map) line: 90	
	StaticHTMLViewer.renderReport(IProgressMonitor) line: 666	
	StaticHTMLViewer$16.work(IProgressMonitor) line: 761	
	StaticHTMLViewer$16(AbstractJob).run(IProgressMonitor) line: 48	
	1 collapsed frames	

So it's doing this comparison, for example:

Image

I don't really know how best to fix such a problem. Maybe better here it uses some "default collator/comparator" instead just the plain old string compare which is almost always going to produce undesirable results except for en_US:

Image

E.g.,

Image

That produces a better order:

Image

Opinions anyone? (Hopefully that doesn't break some test expectations.)

merks avatar Dec 11 '25 07:12 merks

Just want to say that I know this kind of problem from other software as well (e.g. from Oracle, at least with older releases). I don't know how to fix this. As long as the bug persists, it may help to display an information message before or after the cross tab that explains the ordering.

hvbtup avatar Dec 11 '25 08:12 hvbtup

I don't know how to fix this.

This fixes the problem using com.ibm.icu.text.Collator:

Image

The question is, are folks comfortable with that fix? It seems to me using the collator for the locale is always better than using unicode order, which I think doesn't even get upper/lower case correct.

merks avatar Dec 11 '25 08:12 merks

Well, this is a change in behavior. In theory, this might be a breaking change, because the output sorting sequence has changed.

But for me, I would consider this a bug fix. And if someone absolutely needs the old (from my POV: wrong) behavior, they can easily add the locale settings for the sorting of the cross tab.

So I'm +1 for fixing this.

But Collator.getInstance() doesn't seem right. On your machine, and on my machine, and on the OP's machine, the result might be right, but only because the locale settings match our personal expectations and because

Wouldn't we need to supply the settings from the cross tab to the constructor or sth like that? I'm not an expert but I think just using getInstance() returns a default collator based on system properties or env variables - which usually, but not always, match the settings explicitly specified in the report.

hvbtup avatar Dec 11 '25 08:12 hvbtup

That method is really widely used already. 🤪

Image

I can create a PR and see how it goes with the tests...

merks avatar Dec 11 '25 08:12 merks

This is somewhat concerning, but obviously no one really complained about it in 15 years. Using the default Collator is still much better than using no Collator at all. So I'm still +1 for your change.

hvbtup avatar Dec 11 '25 08:12 hvbtup

The fix is available here:

https://download.eclipse.org/birt/updates/nightly/latest/

merks avatar Dec 21 '25 15:12 merks