Use SQLite ICU extension to enable case-insensitive searches for non-Latin characters

Open BenJamesBen opened this issue 3 months ago • 1 comments

This is an enhancement request for Anki to consider using the SQLite ICU extension. As I understand it, doing so will result in case-insensitive simple search functionality for all characters. Currently, simple search is case-insensitive only for Latin characters.

It may be that using the ICU extension will result in slower searches. If the increase in search time is too great, it may be that this enhancement request should be rejected and closed.

https://www.sqlite.org/lang_expr.html#like

SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. [...] The ICU extension to SQLite includes an enhanced version of the LIKE operator that does case folding across all unicode characters.

https://sqlite.org/src/dir/ext/icu

[...] the built-in SQLite LIKE operator understands case equivalence for the 26 letters of the English language alphabet. The implementation of LIKE included in this extension uses the ICU function u_foldCase() to provide case independent comparisons for the full range of unicode characters.

https://docs.ankiweb.net/searching.html#simple-searches

Standard searches are case insensitive for Latin characters - a-z will match A-Z, and vice versa. Other characters such as Cyrillic are case sensitive in a standard search, but can be made case insensitive by searching on a word boundary or regular expression (w:, re:).

Although the current Anki search behavior is documented, it resulted in a user believing that they were seeing a bug and posting on the forum: https://forums.ankiweb.net/t/the-problem-with-the-register-upper-lowercase-searching/66105

Aug 31 '25 21:08 BenJamesBen

Related: https://github.com/ankitects/anki/issues/1979

Sep 02 '25 19:09 abdnh