xbmc icon indicating copy to clipboard operation
xbmc copied to clipboard

Use ICU to Compare Strings on Android

Open taoyouh opened this issue 4 years ago • 18 comments

Description

Use ICU library to compare strings on Android, thus locale-aware sorting was supported.

Motivation and Context

Locale-aware sorting was not supported on Android (#15125).

How Has This Been Tested?

Tested on my phone (Mi 9 Pro with Android 9), with locales set to English (US), Chinese (Simplified), and Chinese (Traditional). See screenshots for results.

Screenshots (if appropriate):

English (US) locale

Accented letter "Ö" ordered before "O": Screenshot_2020-05-11-13-47-54-623_org xbmc kodi Chinese characters ordered by Unicode value: Screenshot_2020-05-11-13-48-04-633_org xbmc kodi

Chinese (Simplified) locale

Accented letter "Ö" ordered before "O": Screenshot_2020-05-11-13-48-59-902_org xbmc kodi Chinese characters ordered by pinyin (phonetic transcriptions of Chinese characters): Screenshot_2020-05-11-13-48-43-763_org xbmc kodi Numeric ordering works: Screenshot_2020-05-11-13-59-14-944_org xbmc kodi

Chinese (Traditional) locale

Chinese characters ordered by number of strokes: Screenshot_2020-05-11-14-00-15-264_org xbmc kodi

Types of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] Clean up (non-breaking change which removes non-working, unmaintained functionality)
  • [x] Improvement (non-breaking change which improves existing functionality)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that will cause existing functionality to change)
  • [ ] Cosmetic change (non-breaking change that doesn't touch code)
  • [ ] None of the above (please explain below)

Checklist:

  • [x] My code follows the Code Guidelines of this project
  • [x] My change requires a change to the documentation, either Doxygen or wiki
  • [x] I have updated the documentation accordingly
  • [x] I have read the Contributing document
  • [ ] I have added tests to cover my change
  • [ ] All new and existing tests passed

taoyouh avatar May 10 '20 06:05 taoyouh

CI fails saying ERROR: $GIT_COMMIT (6fa2c47919cca4c27bc0778178b2f62c0dde67d3) doesn't match currently checked out revision (c41d2f0ce0f329e5b59554c8f5c0667a8e73be7f) Build step 'Execute shell' marked build as failure. Why does this happen?

taoyouh avatar May 11 '20 06:05 taoyouh

If you do a force push while Jenkins is running, it'll usually stuff it up for that PR. So any subsequent pushes will generally not run. Best to try not force push if your PR Jenkins run hasn't completed fully. Annoying, but that's what we have to deal with at times unfortunately.

Jenkins build this please

fuzzard avatar May 11 '20 06:05 fuzzard

The latest change should have solved the build error, but Jenkins is not working again.

taoyouh avatar May 11 '20 16:05 taoyouh

I've resumed the build on jenkins and it's green now. Just a note: before merging, we can to upload libicundk to our mirrors.

wsnipex avatar May 11 '20 19:05 wsnipex

I am looking forward to this feature. Are there any further progress?

taoyouh avatar May 25 '20 02:05 taoyouh

@peak3d Did you have time to look at this yet? It's a very long-standing problem unfortunately.

yol avatar Aug 17 '20 07:08 yol

@taoyouh thanks you for this contribution. I can not advise you on the Android specific aspects of this PR, but I have been working to make it possible to use consistent collated sorting both in memory (itemlists) and in the databases directly. Some of what you have done needs to be modified with that in mind. I will try to explain as clearly as I can, but it may not be easy. See description in https://github.com/xbmc/xbmc/pull/17838 which should help.

Historically all sorting was done in memory, which is inefficient in some scenarios e.g. fetching entire tables from the database when only the top 10 records in a certain order are wanted. The long term aim is to do as much record sorting as possible in SQL on the db side, but we are only part the way there. Initially the approach is being prototyped for the music database, with video to follow.

Sorting in memory uses StringUtils::AlphaNumericCompare to provide case insensitive, natural number order, (sometimes) language collated sorting of text. To get the same sorting results in SQLite a callback collation function is used, implemented in AlphaNumericCollation https://github.com/xbmc/xbmc/blob/ebaca1f1a190e05aaab821481a2233f2c6f4d95c/xbmc/utils/StringUtils.cpp#L1194 There was no efficient way to combine these into one (I can explain in detail later if you want), hence changes to collation implementation will need to apply to both functions.

Then there are those on client/server using MySQL/MariaDB for databases. This does not support collation callbacks, but a comparable natural number sorting etc. can be implemented via database functions. Hence for consistency between memory and db sorting, AlphaNumericCompare is adjusted to match MySQL collation when using that RDBMS. The same approach is also applied for those platforms that do not support the C++ locale facet for collation e.g Android and RPi. This is the purpose of g_langInfo.UseLocaleCollation(), negation meaning use MySQL comparible collation because either a) using MySQL, b) locale facet is unsupported or c) user choice.

You want to ICU compare to be used on Android instead of C++ locale facet, but not as you have implemented it replacing MySQL style collation. Hence UseLocaleCollation() will need to be set differently (maybe even renamed?) to include ICU on Android as positive "locale supported", and the Android specific code IFDEF be replacing what happens when UseLocaleCollation() is true.

Phew! I'm sure that all sounds worse than it really is. Ask if you need me to clarify.

DaveTBlake avatar Aug 17 '20 15:08 DaveTBlake

@DaveTBlake Finally I have time to understand this. Do you mean that I need to a) rewrite the SQLite collation callback to use ICU too, b) and make UseLocaleCollation() be true when Android ICU is used for collation?

taoyouh avatar Sep 30 '20 07:09 taoyouh

@taoyouh sorry for delay, I lost sight of this.

a) rewrite the SQLite collation callback to use ICU too,

Yes, StringUtils::AlphaNumericCollation must result in the same order as StringUtils::AlphaNumericCompare. Take care to do so without slowing the collation callback function.

b) and make UseLocaleCollation() be true when Android ICU is used for collation?

When UseLocaleCollation() is true then use locale facet or ICU for collation on Android depending on platform.

Ensure UseLocaleCollation() is set false (so MySQL comparible collation is perfromed) when any one of a) using a MySQL client/server setup, b) neither locale facet nor Android ICU is supported or c) user choice. This means a change to CLangInfo::UseLocaleCollation() to check that locale collation facet or ICU is implemented on the platform. It could be sensible to rename this method.

DaveTBlake avatar Oct 12 '20 11:10 DaveTBlake

This feature is very welcome @taoyouh but bumping it to v20 as v19 entering Beta phase with focus on testing and bug fixes, and this improvement needs more extensive work.

DaveTBlake avatar Oct 29 '20 14:10 DaveTBlake

I guess this is the only sane option to fix #19883 properly on all platforms, not just Android.

basilgello avatar Jun 25 '21 20:06 basilgello

@taoyouh sorry for delay, I lost sight of this.

a) rewrite the SQLite collation callback to use ICU too,

Yes, StringUtils::AlphaNumericCollation must result in the same order as StringUtils::AlphaNumericCompare. Take care to do so without slowing the collation callback function.

b) and make UseLocaleCollation() be true when Android ICU is used for collation?

When UseLocaleCollation() is true then use locale facet or ICU for collation on Android depending on platform.

Ensure UseLocaleCollation() is set false (so MySQL comparible collation is perfromed) when any one of a) using a MySQL client/server setup, b) neither locale facet nor Android ICU is supported or c) user choice. This means a change to CLangInfo::UseLocaleCollation() to check that locale collation facet or ICU is implemented on the platform. It could be sensible to rename this method.

I think I finally finished these works after a long time. I wish my change still works with the latest master codes.

taoyouh avatar Nov 12 '21 11:11 taoyouh

@DaveTBlake Hi, do you have some time to check this out? I just rebased it onto the latest master branch.

taoyouh avatar Dec 25 '21 02:12 taoyouh

I'm looking forward to this feature

taoyouh avatar Jan 09 '22 14:01 taoyouh

@DaveTBlake I implemented the AlphaNumericCollation method but didn't find a way to test it. Could you please point out some situation under which in-database sorting will be used?

taoyouh avatar Apr 26 '22 03:04 taoyouh

I've made some formatting changes to meet the current code style. The diffs are available in the following links:

For more information please see our current code style guidelines.

jenkins4kodi avatar Apr 26 '22 04:04 jenkins4kodi

@DaveTBlake I implemented the AlphaNumericCollation method but didn't find a way to test it. Could you please point out some situation under which in-database sorting will be used?

Sorting in database is invoked by JSON API requests to fetch sorted data from the music library

AlphaNumericCollation is used as a callback function for sorting in SQLite, so care has to be taken over designing function for speed as well as that the end results match that which in memory sorting for GUI display produces.

MySQL/MariaDB databases also need to be tested and again ensure that the results produced from in memory sorting for GUI display match.

DaveTBlake avatar May 01 '22 14:05 DaveTBlake

@taoyouh this needs a rebase

jenkins4kodi avatar Jun 29 '22 03:06 jenkins4kodi