Zimpedia
Zimpedia copied to clipboard
case unsensitive search
Hi, I think it would make sense to build in an option to search case unsensitive.
Yes, it makes totally sense. Unfortunately there is no easy way to implement it. ZIM file, as well as Wikipedia, contains case-sensitive title index.
The only possible implementation I see right now, is to emulate case insensitive by doing multiple searches. For instance when you type "case sensitivity", Zimpedia should make at least 4 queries: "case sensitivity", "Case sensitivity", "Case Sensitivity", "case Sensitivity". Then it is need to merge 4 results, eliminate duplicates etc.
Here is the right quote from wikimedia article that summarize the problem:
Case sensitivity in MediaWiki is both a blessing and a curse
@popanz @mkiol We are working currently at Kiwix to improve maybe fully fix this problem. Solution will be implemented in zimlib and probably also partly in kiwix-lib. But kiwix-lib is now a separate git repo, which can easily be reused.
@kelson42 That is great news! Zimpedia uses zimlib, so implementation in zimlib interest me the most. Is there any issue number that I could observe to be informed about the implementation status?
- https://github.com/kiwix/kiwix-lib/issues/28
- https://github.com/kiwix/kiwix-lib/issues/29
The question of introducing a fully case insensitive suggestion system is still open on my side. For now we basically try to generalised the fulltext (case insensitive) search.
I never really noticed this in Kiwix on Android because there the keyboard starts in lowercase (in Kiwix search, not in a general textfield). So maybe there's a simple way to set something like keyboard.autocaps = false
for the search field?
On a slightly related note, but perhaps this should be its own FR, it'd be nice if eleve
could also find élève
. Typing all those accents is a bit of a pain on a phone.
So maybe there's a simple way to set something like
keyboard.autocaps = false
for the search field? It'd be nice ifeleve
could also findélève
Unfortunately it is not so simple right now. A search result retried from libzim is case sensitive, so to achieve what you've suggested few searches with different case variants (élève, elève, éleve, eleve, elevé, etc.) should be made. All results should be de-duplicated and merged. It is possible but complicated...
As suggested @kelson42, maybe you should try full-text search mode (this feature was added in the recent Zimpedia update). It is case insensitive but unfortunately results could be sometimes unpredictable.
Those two paragraphs shouldn't be taken together like that. ;-)
I was slightly wrong about present-day Android Kiwix. I suspect it might perform two searches, one lowercase and one uppercase. (But certainly not full text.)
But what I was talking about is that in Android Kiwix, the keyboard opens lowercase, like this. I know Sailfish can do that too,e.g., for the browser addressbar.
In Zimpedia, it opens like this:
So if you just start typing a word like bear
, you'll unintentionally search for Bear
. Then you'll only find, say, Bearbeitung
. With the exception of German, I think that defaulting to lowercase would largely resolve the issue without any change to the actual searching code. And presumably that's just a simple flag somewhere.
Kiwix will not, of course, find the word élève by typing [Ee]leve
. That's just something I'd like it to do. There are fairly standard Unicode algorithms for that I believe.
Thanks for the clarification. I will look in to it.
In 3ab979ea62c38585d45198369356e73d15d753d9 I've added following search procedure:
- first search with upper case first letter
- second search with lower case first letter
- results are merged and sorted case insensitive
It works pretty well... but élève
case is far more complex...
Very nice! :+1:
but
élève
case is far more complex...
Well yeah, diacritics are hard. ;-) I'll have to check what GoldenDict does because you've made me curious. Iirc it performs a variety of clever tricks with diacritics and morphology alike.