ibus-typing-booster
ibus-typing-booster copied to clipboard
Feature request: fuzzy matching
Do you think it would be hard to support it?
I think that can make typing much easier on average.
Thanks!
There is some degree of fuzzy matching already.
If you have the hunspell dictionaries installed and ibus-typing-booster can find them and python3-enchant is available, you will get spell checking matches. I.e. if you slightly mistype a word, you will still get a match.
On top of that, matching is accent insensitive, i.e. if you type in a language using many accents (Czech, French, German, ...) you can type words without the accents and select the correct words with the accents in most cases. Or even type the accents completely wrong and still get matches of the correct words. That helps a lot if you are learning the language and are not sure about the correct accents.
Is that what you meant?
What I mean is that if I type "hm" I will also get a match for "home", similar to how, for example, https://github.com/junegunn/fzf works.
For something as short as "hm" that will easily give far too many matches. If you have too many matches to choose from, speed will suffer. fzf seems focused on command lines, not human languages.
Can you please elaborate on the speed issues you expect with lots of matches? Also, how is the scoring/ranking done?
I mean the speed issues of the user having to select from too many matches. For example, “hm” could match “him”, “ham”, “hum”, “home”, “hame”, “ohm”, ... and probably many others. If the candidate list shows many matches you have to look at all of them and decide which is the one you wanted. That takes time and just typing the complete word may be much faster then, at least if you are a fast typist. The length of the first page of the candidate list is configurable, it can be between 1 and 9. 9 is already quite a lot. If the desired match is not in the first page of the candidate list and you have to scroll down to the next page or pages, this is almost always slower than just continue typing the complete word.
Lots of matches can be useful only in rare cases, like when trying to find a particular emoji for example.
Swiftkey, a similar program to speed up typing which runs on Android never shows more than 3 candidates. Showing lots of candidates isn’t really useful, more important is to show high quality candidates, i.e. candidates you are really likely to type.
The scoring is done by remembering what the user types often.
For example, if you often type “I want beer” and less often type “I want bread”, the next time you type “I want b” you will see “beer” as the first candidate. I.e. if you type one or more letters (like “b”), ibus-typing-booster calculates which words starting with these letters you typed most often considering the words you typed immediately before that (like “I want”). In Gnome programs, ibus-typing-booster can get that context by asking the program you are typing in for the context left of the cursor, i.e. even if you move the cursor somewhere else using the mouse or the arrow keys, ibus-typing-booster can get the correct context. In non-Gnome programs this doesn’t work and ibus-typing-booster falls back to just remembering the last words you typed, in that case the context may be wrong if you move the cursor and continue typing in a different position.
Try this out by typing a few sentences and then type similar sentences again, you will see that you get words scored high in the candidates which you typed recently in that context.
If you type a word completely but spelled wrong, for example if you type “somwhere” you will get a spell checking suggestion like “somewhere” high up in the candidate list. If you select that candidate “somewhere” and do the same spelling mistake again later, ibus-typing-booster remembers that and after typing “somw” you will already see “somewhere” in the candidate list (Because it remembers that you recently type something starting with “somw” and then selected “somewhere”).
Thanks @mike-fabian for the thorough explanation!
The way I see it, the problem you describe with having to scan a long list of candidates is more about scoring/ranking than matching. If you think about it, (boolean) matching can be viewed as some form of scoring as well (where matches get the score 1 and non-matches get 0, assuming you show the user all words with score > 0). So if the scoring works well, even if there are lots of matches, the right matches will be on top. For example, given a user query of "ho", it probably makes sense to rank "home" higher than "through", even though both match fuzzily. Now, getting the scoring perfect is an unsolved problem. But it may be feasible to make the scoring good enough so that fuzzy matching will be a positive change.
@mike-fabian @infokiller I have extracted the Android auto-suggestion and correction library from the last open source release of the Android (now Google) keyboard. It is basically a C++ library.
I am working on nice Python bindings. You can check it out here:
https://github.com/wolfv/suggestr
My initial version from a couple years ago also had a DBus Server, which we could easily revamp to make it work. I don't know the exact model how ibus-typing booster retrieves suggestions but I think using the Android keyboard sources is ideal. It is very fast, supports the usage of context, comes with a couple of good dictionaries (that have probabilities).
Maybe someone has a cycle or two to try to get it to work together? I think this would be a game-changer, not only for ibus-typing-booster, but also for touchscreen keyboards in Gnome (and upcoming phones).
@wolfv sounds great, I agree that the Android keyboard is a good direction!
@mike-fabian @infokiller I have extracted the Android auto-suggestion and correction library from the last open source release of the Android (now Google) keyboard. It is basically a C++ library.
I am working on nice Python bindings. You can check it out here:
https://github.com/wolfv/suggestr
My initial version from a couple years ago also had a DBus Server, which we could easily revamp to make it work.
What did you use a dbus server for?
I don't know the exact model how ibus-typing booster retrieves suggestions but I think using the Android keyboard sources is ideal. It is very fast, supports the usage of context, comes with a couple of good dictionaries (that have probabilities).
Maybe someone has a cycle or two to try to get it to work together? I think this would be a game-changer, not only for ibus-typing-booster, but also for touchscreen keyboards in Gnome (and upcoming phones).
By "get it to work together", do you mean combining suggestr with ibus-typing-booster?
@mike-fabian @infokiller I have extracted the Android auto-suggestion and correction library from the last open source release of the Android (now Google) keyboard. It is basically a C++ library.
That means the current version is not open source anymore?
I am working on nice Python bindings. You can check it out here:
https://github.com/wolfv/suggestr
My initial version from a couple years ago also had a DBus Server, which we could easily revamp to make it work. I don't know the exact model how ibus-typing booster retrieves suggestions but I think using the Android keyboard sources is ideal. It is very fast, supports the usage of context, comes with a couple of good dictionaries (that have probabilities).
Maybe someone has a cycle or two to try to get it to work together? I think this would be a game-changer, not only for ibus-typing-booster, but also for touchscreen keyboards in Gnome (and upcoming phones).
I’ll have a look.
But I was never really impressed by the prediction quality of the Android keyboard, SwiftKey seems to work much better.
yes, at some point in time google decided to create a close-sourced fork of the keyboard, and rebrand it as Google Keyboard. Some features (such as swipe-to-type) have never been available in the open source version, although some code fragments for this feature can be found in the source code. However, the sources for the original Android keyboard can still be found and are also still maintained.
Well ... if you find the sources for SwiftKey somewhere, sure, that might be better. But it's not a trivial piece of software so I think we should work with what we got :) Everything is going to be quite a step up from the current approach of using Hunspell (which doesn't factor in the keyboard layout, for example, and doesn't offer any real predictive capabilities, as well as not taking into account the context).
I'd be excited if you give it a shot, even though my stuff is also not really "finished".
Regarding the DBus server: this was just an experiment to create some API to the code. I used it to get suggestions into UberWriter (as a prototype). The Python interface will be much more powerful and we'll be able to create DBus or other bindings going from there.
Factoring in the keyboard layout is more difficult on the desktop because you don't know what the keyboard layout actually is, usually.
The current approach is not only hunspell, that is a sort of last ditch fallback if no learned data is available yet. And of course hunspell gives spellchecking suggestions. Preditions in ibus-typing-booster at the moment rely mostly on what the user has typed in the past already and it does take context into account. Only if nothing can be found in the user database for that context, hunspell predictions end up at the top of the prediction list. Usually I see predictions from my user database at the top of the prediction list.
Ok, then I probably haven't tried it for long enough, sorry about that.
Locally I have example code for the suggestr
package that allows one to build up user dictionaries (with probabilities etc) as well, there is some pretty advanced code in the keyboard sources to allow that. The bigger problem is that this is all largely undocumented, so it's a bit like reverse-engineering.
I think most keyboard layouts are fairly standard, and the layouts are also configurable. Exact matching layouts might not be required, but the knowledge which keys are next to each other can help a good bit for getting fitting results IMO (that's likely also how SwiftKey and anybody else does it).
Some keyboards like the French one for example are very different.
Sure - you can dynamically switch Layouts with the android keyboard sources. here is (the only) currently included definition: https://github.com/wolfv/suggestr/blob/master/prediction/key_set.h
Will need to figure out how to properly modify this to support other layouts then QWERTY but it can certainly be done :)
Also, of course, touch and regular keyboards are somewhat different, so maybe it would be good to tune some other geometric values -- but on the other hand I think out of the box this will give nice predictions.
Here is more work (apparently with dynamic keyboard layouts) in the Chromium sources: https://chromium.googlesource.com/chromiumos/platform/suggest/+/refs/heads/master/src/demo.cc
@mike-fabian I have just added support for reading in arbitrary layouts (in Python), you can find an example Notebook using the German keyboard layout here: https://github.com/wolfv/suggestr/blob/master/notebooks/Example%20Usage.ipynb
And the layouts are stored here: https://github.com/wolfv/suggestr/blob/master/layouts/de.yaml
Note that these layouts are taken from the squeekboard repo (the Librem 5 soft keyboard). The code to read in the layout is pretty straight forward and we could adjust it to any sort of keyboard layout definition. And we could also make a translator from the original YAML layout definition to a simpler, pre-computed file format that can be easily parsed from within C++.
Let me know what you think!
@mike-fabian did you ever have a chance to look at suggestr?