zoxide icon indicating copy to clipboard operation
zoxide copied to clipboard

The search string should be used in determining the score

Open lefth opened this issue 3 years ago • 11 comments

Right now, the frequency of use determines which directory is chosen when there are multiple matches. I suggest the closeness of the match should also be used, and that should take precedence over the frequency score.

Why? Because the user can control zoxide's behavior by typing a more specific name. Also because it's very unexpected when I type z Cat and zoxide takes me to "duplicates". It's unintuitive that the search string is used to filter results but not to select a result.

I suggest directories should be ranked based on how good a match they are with the search term, and the frequency score can be used to break ties. Or the frequency score could be incorporated with a lower weight: total score = match quality * 0.8 + frequency score * 0.2.

lefth avatar Sep 07 '21 13:09 lefth

Preliminary suggestion about the rules, which should be adjusted to avoid wrong behavior:

The best matches would match the case (or smartcase) and would match the search term as a word, like \bTheSearchTerm\b The next best matches would be as above, but the case need not match. The next best matches would match the case (or smartcase) and would match the search term on a left word boundary. The next best matches would be as above, but the case need not match. The next best matches would be directories that have the search as a substring (case matching). As above, but case not matching. Optional: The worst matches would be fuzzy matches, possibly using fzf or sk.

(Case changes should also be considered a word boundary, such that "HelloWorld" and "helloWorld" are each counted as two words.)

lefth avatar Sep 07 '21 13:09 lefth

@ajeetdsouza What do you think about this change? Is it a direction you'd like this program to go in?

lefth avatar Sep 11 '21 18:09 lefth

Hey @lefth!

I agree with what you're saying. One thing that makes zoxide easy to use is that the query algorithm is very simple and predictable, but I there is definitely room for improvement there.

Smartcase is relatively easy to do, I can pick it up in the near future. Tuning the algorithm to work well in all cases is a harder problem -- I'll try out a couple of ideas once I get the time.

ajeetdsouza avatar Sep 11 '21 21:09 ajeetdsouza

Perhaps an option for enabling a different way of finding the directory? Like one could be checking if there is a directory called X, and if there is cd to that.

Milo123459 avatar Nov 07 '21 16:11 Milo123459

Not quite sure I understood. Could you elaborate with an example?

ajeetdsouza avatar Nov 07 '21 16:11 ajeetdsouza

Say you have rust with a lower score than rust-analyzer. Currently, when you type z rust it'll take you to rust-analyzer. This checks the whole list of directories stored and sees if there is one called rust, if there is, cd into that directory. If not, cd into the other one (in this case, rust-analyzer)

Milo123459 avatar Nov 07 '21 16:11 Milo123459

@ajeetdsouza do you have any plans to tackle this? If not could you guide me on what files I would need to look at to make this change. I would love to make a PR for this if you don't have time to do it yourself. I work with bevy a lot and all my projects are prefixed with bevy which makes it pretty much impossible to go to bevy when I type z bevy so I really want a fix for this.

IceSentry avatar Jul 23 '22 02:07 IceSentry

By the way, I made a working implementation of this feature in my fork, but this repo's author and I weren't free at the same time to work together to tweak it to his satisfaction and get it merged.

That said, tuning this feature is extremely hard. How do you decide what scoring will work for the most people, and if it should be configurable, how?

For example, I think the match quality is important, but it shouldn't always win over the frequency. If I go into ~/music every day and I seldom go into ~/.config/mu, then z mu should go to ~/music, but only because the frequency score is much much higher. If there's not such an extreme difference, z mu should definitely match ~/.config/mu because it's a perfect match: string length, case, word boundaries, etc. But for this to work, it's important to not count "accidental directories" as visited. (See issue #428)

lefth avatar Jul 23 '22 11:07 lefth

The first question of tuning is how the matches should be ranked. They can either be sorted or they can be given a score.

If the matches are sorted, the algorithm is easier. For example a whole directory match is always better than a whole word match, which is better than a left-anchored word match, which is better than a right-anchored word match, which is better than a substring match. (For example, with a few more rules assumed: "bevy" would first match these directories in order: "bevy", "bevy-src", "bevy2", "temp-bevy", "tempbevy", "tempbevysrc".) (I'm leaving case sensitivity aside for now, as that makes it a lot more complicated. Case sensitivity is worth handling even if it's hard though.)

The above is straightforward, but it is not amenable to using match quality together with frequency score. Whereas if the algorithm says search string "bevy" matches "bevy" with 400 points and 50 frequency points, the algorithm can decide that matching "bevy-src" with 200 points and 720 frequency points wins. This system is hard to design and hard to tune, which is part of why I did not pursue development on my branch and claim that my way was the right way. This feature needs a lot of tuning, and some users won't be happy with the result.

lefth avatar Jul 23 '22 11:07 lefth

Personally, a full match should always match. If mu isn't enough then type mus. That's already what happens when you are in a folder that contains that exact match. For example if I'm in my /repos folder and I type z bevy it always goes to bevy, but if I am in any other folder on my machine it doesn't go to bevy, which is exactly why this is annoying.

At least make it opt-in, I wouldn't mind having to do something like z -f bevy I'll just alias it to `zf`` or something.

If you want to make it more complicated, the length should be taken into account. I believe 4 letters is enough for an exact match, but 1-2 letters is probably not enough.

IceSentry avatar Jul 24 '22 17:07 IceSentry

That's a great idea. And another benefit about having a minimum length for "exact match" is that it can potentially be configured by users, or turned off by setting the exact match min length to 9999.

(To be clear this isn't enough, but I'd say it's okay if it's implemented in a way that can easily be extended by adding additional heuristics. So it would be best if the first commit of this feature contains at least two match quality heuristics.)

lefth avatar Jul 25 '22 00:07 lefth