embedded-postgres icon indicating copy to clipboard operation
embedded-postgres copied to clipboard

ORDER BY not working with capital case

Open rayaanrizwan1234 opened this issue 1 year ago • 4 comments

When doing an ORDER BY ASC it seems to get back the ones with capital case first. For example, if we had a column and a record with 'Z' and another with 'a', doing an ORDER BY ASC on that column will bring the record with 'Z' first then the other. Any way to fix this?

org.opentest4j.AssertionFailedError: expected: <[SimpleLookup(code=null - null, description=Desc, active=true), SimpleLookup(code=null - RG, description=Desc, active=true)]> but was: <[SimpleLookup(code=null - RG, description=Desc, active=true), SimpleLookup(code=null - null, description=Desc, active=true)]> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1145)

rayaanrizwan1234 avatar Apr 02 '24 13:04 rayaanrizwan1234

This is not a bug - it is a consequence how letters are represented.

For example in Unicode: List of Unicode characters - Latin Script

  • Z - Latin Capital letter Z - code 90
  • a - Latin Small Letter A - code 97

Thus Z goes before a.

If you want to compare ignoring the case, convert both sides to same case using UPPER(columnA) or LOWER(columnA).

If you want more details: You may be using a different character set for strings: 24.3.1. Supported Character Sets However, the code points for Latin Script are shared across many character sets, and were introduced by ASCII encoding

lesiak avatar Apr 08 '24 12:04 lesiak

I'm not an expert in this area, but I think it should also be possible to use ICU collation settings:

  • https://www.postgresql.org/docs/current/collation.html#ICU-CUSTOM-COLLATIONS
  • https://peter.eisentraut.org/blog/2022/09/26/icu-features-in-postgresql-15
  • https://www.2ndquadrant.com/en/blog/icu-support-postgresql-10/

tomix26 avatar Apr 08 '24 12:04 tomix26

Thanks for getting back on this. What character set does the embedded Postgres use? and is there a way to specify which one it uses?

rayaanrizwan1234 avatar Apr 22 '24 12:04 rayaanrizwan1234

Sorry for the late reply, it should be UTF8: https://github.com/zonkyio/embedded-postgres/blob/master/src/main/java/io/zonky/test/db/postgres/embedded/PreparedDbProvider.java#L243

tomix26 avatar Jul 19 '24 13:07 tomix26

Closing the issue. Feel free to reopen it if you have more questions.

tomix26 avatar Sep 27 '24 12:09 tomix26