nitrite-java
nitrite-java copied to clipboard
Accent support for ObjectFilters.text
Hi @anidotnet :)
As #144, I faced an issue with accents but when I search for text e.g. for user firstname.
For instance, my own firstname contains an accent on the first 'e' character (Sébastien) and when I search for it in a user collection in a LIKE way with ObjectFilters.text("firstname", "*se*") it is not returned (obviously, with a "*sé*" search it is ;)).
I already added an index on the searched field with@Indices({ @Index(value = "firstname", type = IndexType.Fulltext) }) so I suppose the only fix should be to add a Collator parameter as well on ObjectFilters.text method?
Diving into the code, I've just found TextTokenizer and TextIndexingService classes and a way to pass them upon database creation according to your documentation so it should do the job, I test it :)
Has it resolved your issue?
Unfortunately, TextTokenizer isn't helping as it is only a list of stop words.
TextIndexingService could have helped but a known and stable indexing service for Android like Apache Lucene you mentioned in the documentation does not seem to exists :/
I took a look at the Collator class for this specific case but se and sé are still different strings for it and this is expected for sorting feature.
The only solution I've gone for is to add a normalized firstname field in the collection which is set upon document insertion with Normalizer.normalize(firstname, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "") and used only for searching. The search term is also normalized obviously.