mcfly icon indicating copy to clipboard operation
mcfly copied to clipboard

Option to priorize exact matches over fuzzy ones

Open yangm97 opened this issue 4 years ago • 13 comments

yangm97 avatar Aug 09 '21 15:08 yangm97

I assume you mean when MCFLY_FUZZY is enabled?

cantino avatar Aug 19 '21 22:08 cantino

yes

yangm97 avatar Aug 21 '21 15:08 yangm97

Seems reasonable.

cantino avatar Aug 23 '21 15:08 cantino

I like fuzzy matching so that I don't have to remember the exact punctuation in a file name. But I'm finding that using it completely removes the claimed benefits of McFly's intelligence: image The command I want is the one I ran 10 minutes ago, at position 6. The top choice isn't even the best fuzzy match for the words I've typed. Am I doing something wrong? I have these settings:

$ env | grep MCFLY
MCFLY_FUZZY=true
MCFLY_RESULTS=30
MCFLY_HISTORY_LIMIT=10000
MCFLY_SESSION_ID=UzHARz6EfjOqhxy8VIvT9Bcg
MCFLY_HISTORY=/var/folders/10/4sn2sk3j2mg5m116f08_367m0000gq/T/mcfly.XXXXXXXX.EbZr6lxT

nedbat avatar Sep 06 '21 11:09 nedbat

I don't personally use fuzzy matching because I agree that it's of lower quality.

cantino avatar Sep 06 '21 19:09 cantino

Is there some way to improve it? It seems a shame to offer a setting which seems to negate the primary claim of the tool (intelligent history).

nedbat avatar Sep 06 '21 20:09 nedbat

I'd be open to contributions that improve it. It was contributed by a user and isn't a feature I use myself. I prefer the non-fuzzy matching for how I tend to use mcfly.

cantino avatar Sep 11 '21 19:09 cantino

@nedbat matches are weighted by length, per https://github.com/cantino/mcfly/pull/103#issuecomment-720139246

Having been using it for a while myself I agree the balance could stand to shift further towards shorter matches. Easiest tweak is to add a FUZZY_FACTOR to that weighting algorithm -- even better if it's configurable so many people can try out different factors and speed up the process of converging on a generally useful default.

dmfay avatar Oct 27 '21 03:10 dmfay

With 0.5.10 just out the fuzzy experience should be dramatically improved. If you have MCFLY_FUZZY=true you'll start with a "fuzzy factor" of 2, or you can set the environment variable to another integer value. Higher values of MCFLY_FUZZY favor shorter and earlier matches; 0 turns it off.

In my testing a fuzzy factor of 1 didn't do quite enough to prioritize what I was searching for, and 10+ weighed brevity and start position too heavily over the built-in rank. As I mentioned in the readme I expect the best results to be in the 2-5 range, but I also only have my own history to test with. If you have the time to try a few different settings please report how it works out for you and what MCFLY_FUZZY value you settle on!

dmfay avatar Nov 06 '21 20:11 dmfay

Maybe I'm not getting it, but for me fuzzy search does not work, at least as I'd expect it to work. Having set export MCFLY_FUZZY=2 (same for 1), I press ctrl-r in bash and type "" and it finds something but first command in list does not contain at all. Shouldn't it contain it?

I used fzf command for searching history file before, and I like their syntax and it might work here as well? Space is delimiter and each entered word has to be present in line, with possibility to negate it using !word, your searches can be made case sensitive/insensitive, etc.

alfonz19 avatar Aug 30 '22 08:08 alfonz19

I thought it might have something to do with quotes, but no, that seems to work:

1661864453

double-check env | grep -i mcfly ?

dmfay avatar Aug 30 '22 13:08 dmfay

double checked: image

It was just my bad expectation.

explanation: If i have line: "Pretty horse finished last", then following words will match: "pest", "p e st" or "p e st". So it probably means when fuzzy search is on, we can think of entered search string as regular expression, where there is ".*" automatically added after each char. It's fine, yet I'd say that this blind fishing is less beneficial than interpreting it like: there is some command with 'pest' word in it. Sure, in trivial stuff it does not matter. And if I'm searching for 2 words, I can use search string word1%word2, but in that case I have to know the correct order of words in command. I'd say, in case of long commands with lots of options, the longer the command is, the less useful single arbitrary character match will be useful, the less problematic will be remembering the correct order or words to search for. In this specific case it would be more beneficial to interpret entered search pattern: "word1 word2 word3" as: "search for lines containing all these 3 words in any order." Maybe this is already covered, sorry about this comment in that case, I just thought it is done via this option.

alfonz19 avatar Aug 30 '22 13:08 alfonz19

So it probably means when fuzzy search is on, we can think of entered search string as regular expression, where there is ".*" automatically added after each char.

not quite: it also prioritizes shorter and earlier matches (higher MCFLY_FUZZY numbers make this more important relative to the other ranking criteria). Your example search pest runs the length of the entire string because the only t after an s is in the word "last" and so it doesn't do very well in match length.

Borrowing your example further, pretty will match very well, and prt very slightly better as it's shorter (but it's more ambiguous, and you might see other results you aren't looking for); hrse will also do alright. finished/fnshd isn't as good a search because it's both later and "wider" -- preto will be much more effective than that, having the same width of 8 and a start position of 0 instead of 14.

dmfay avatar Aug 30 '22 23:08 dmfay