nitrite-java icon indicating copy to clipboard operation
nitrite-java copied to clipboard

Accent support for ObjectFilters.text

Open Doc1faux opened this issue 3 years ago • 3 comments

Hi @anidotnet :)

As #144, I faced an issue with accents but when I search for text e.g. for user firstname. For instance, my own firstname contains an accent on the first 'e' character (Sébastien) and when I search for it in a user collection in a LIKE way with ObjectFilters.text("firstname", "*se*") it is not returned (obviously, with a "*sé*" search it is ;)). I already added an index on the searched field with@Indices({ @Index(value = "firstname", type = IndexType.Fulltext) }) so I suppose the only fix should be to add a Collator parameter as well on ObjectFilters.text method?

Doc1faux avatar Mar 04 '22 10:03 Doc1faux

Diving into the code, I've just found TextTokenizer and TextIndexingService classes and a way to pass them upon database creation according to your documentation so it should do the job, I test it :)

Doc1faux avatar Mar 04 '22 10:03 Doc1faux

Has it resolved your issue?

anidotnet avatar Mar 04 '22 11:03 anidotnet

Unfortunately, TextTokenizer isn't helping as it is only a list of stop words. TextIndexingService could have helped but a known and stable indexing service for Android like Apache Lucene you mentioned in the documentation does not seem to exists :/ I took a look at the Collator class for this specific case but se and are still different strings for it and this is expected for sorting feature. The only solution I've gone for is to add a normalized firstname field in the collection which is set upon document insertion with Normalizer.normalize(firstname, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "") and used only for searching. The search term is also normalized obviously.

Doc1faux avatar Mar 04 '22 12:03 Doc1faux