atuin
atuin copied to clipboard
Improper prioritization using fuzzy search
It seems that the fuzzy search is improperly prioritizing despite changes made in https://github.com/ellie/atuin/pull/179
For example:
The following interactive query tg plan
will result in:
Despite the fact that there are results for tg plan -out
that should be of much higher priority:
In fact, the tg plan
command does not even appear anywhere within the list of 50+ items
atuin version : 12.0.0
After a bit more testing, it seems that the results of atuin search "tg plan"
are actually different from the ones in interactive mode (not sure why that is the case) and seem a bit better:
Although there is still weird fuzzy searching behavior there.
For example, why would k get ing publicapi-ingress
be ranked higher than tg apply aks-tfplan
? The latter contains the full sequential tg
and plan
.
Same thing for npm list -g | grep -i lang
. I don't see why it would rank above tg apply kv-tfplan
which once again contains both tg
and plan
because the minimum span of the first one is shorter than the second, and there is no prioritisation applied for contiguous letters.
ghci> length "t ing publicapi-in"
18
ghci> length "tg apply aks-tfplan"
19
If you like a different algorithm you could certainly write one.
Thank you for the clarification on the current algorithm. I guess this is subjective, but I do think most users would expect the 2nd choice to be chosen in that situation (despite one being shorter).:
ghci> length "t ing publicapi-in"
18
ghci> length "tg apply aks-tfplan"
19
Would it not be possible to re-use fzf
, skim
or others instead of atuin trying to write its own fuzzy algorithm from scratch?
Also, in regards to the different results between the search command and interactive mode shown in the original post, is that expected?
It's a fair criticism. I think there will always be complaints that the fuzzy search doesn't get exactly what you want.
I don't mind swapping out the fuzzy implementation, but first and foremost it needs to be fast. Reading 100000+ history entries from the database and shelling them out to fzf will be too slow for us.
I'm considering putting together a demo of skim. But I'm more interested in switching to tantivy for a full-text search engine.
Ok, on my 40000 entries, reading the entire database and removing duplicates takes 140ms. Each query of those 40000 entries takes 15ms using skim. It's not awful. you certainly can run the query and render it in a couple frames.
@jemag if you're willing, can you try #695 and report if you find the results subjectively better for your tastes? Is there any noticeable change in performance?
Sure, and thanks for going through this experimentation.
I am a bit busy at the moment, but will definitely find some time this week to try it.
PR works quite well for me. I do feel like the results match more closely to traditional fuzzy finders (and what many users may have come to expect over the years). In terms of interactive performance, I really could not notice it with my ~36k
entries, although if I time a non interactive search it does show up as slightly slower.
In both situations (PR or not) the results from an interactive and a non-interactive search are slightly different, is there a particular reason for this?
Most of the time I want to do an exact match, so in the past fzf with the --exact
option worked perfectly for me. This option inverts the meaning of the '
prefix, so that by default it does an exact match unless you use the '
prefix. I do want to be able to match words in an arbitrary order though to narrow down the results, so the fulltext
option doesn't work for me.
I was actually tempted to investigate adding this kind of option to atuin, but if there's potentially going to be a very different matcher with tantivy in the future then I'm not sure if it makes sense to spend time on that now.
I also gave the "skim" option a try, but with my ~360k entries the initial search was noticeably slower, and for some reason I can't get the '
prefix to work with it even though its documentation says that it is supported.
like @majutsushi I normally prefer exact matches in arbitrary order, I rarely do typos so my need for fuzzy search is not very high. And just today I had a case where I have two servers, "abc.domain.com" and "abctest.domain.com", when I typed "ssh abc.domain.com" in the search, the best match was still "ssh abctest.domain.com", which just feels weird since it was an exact match. The line I typed was still the second match, so it wasn't far away, but still strange.
So, for me, I would prefer to see exact matches ranked much higher than fuzzy matches, or modify fulltext to treat space like AND to get exact matches in arbitrary orders... that would also be more in line with what I expect from a fulltext search.
A prefer_exact_match
config flag that reverses the behavior of ' would be great!