vim Fuzzy completion is incorrect when completeopt contains longest

Steps to reproduce

vim -u NONE -U NONE
input: xyz xzy (begin with the letter x)
set completeopt+=fuzzy
enter y or z, then press <C-n>, both xyz and xzy will appear in the candidate list
set completeopt+=longest
repeat step 4, now the xyz is the only match

Expected behaviour

When set completeopt+=longest,fuzzy, if we enter y or z, then press <C-n>, both xyz and xzy will appear in the candidate list, or neither will appear. Of course, it is best if neither will appear in the candidate list.

Version of Vim

9.1.0598

Environment

The bug was introduced by patch 9.1.0598

Logs and stack traces

No response

Jul 18 '24 14:07 roccomao

When fuzzy is used, the leader input may be scattered in different positions of the string. This means that the substring may not exist. So when fuzzy exists, should the longest be ignored? I would like to hear other people's ideas.

Jul 19 '24 07:07 glepnir

I personally think it is fuzzy collection or fuzzy filtering ?? This determines their priority relative to longest option. My personal opinion is that as chrisbra described in the document before: Only fuzzy filtering should be done, then the priority is lower than the longest option. There are several reasons:

First of all, vim will not automatically pop up the completion menu, so when we press the completion key (like <C-n>), our purpose is actually to start the completion from the leading character of the input, so longest should play his role. At this time, many candidate lists will appear. With fuzzy option, we don't need to use <C-p> or <C-n> or up and down keys many times to select the target completion. We can quickly locate it through the fuzzy filtering function provided by fuzzy option, and your implementation before patch 9.1.0598 is indeed the same.
Secondly, when we enter the leading key and press the completion key (such as <C-n>), the order of the candidate list is not what we expected. Fuzzy filtering will not take effect until we press any letter. The above figure also shows the different effects of fuzzy collection and fuzzy filtering.
Finally, to be as compatible as possible, fuzzy option should add functionality rather than mutually exclusive with existing functionality.

On the other hand, if fuzzy collection and fuzzy filtering are separate functions (#15294), it may be more clear to explain. Let's discuss further, the premise is still set the option set completeopt+=longest,fuzzy.

If fuzzy option only has the function of fuzzy filtering, consider the following words: hello helio think test. When we press h then <C-n>, Since there is no fuzzy collection, the candidate list will only have hello helio and not think, so it is natural to work with the longest role first, and then perform fuzzy filtering on the next input. If we enter ho then <C-n>, the candidate list will be empty, so nothing will happen.
If the candidate collection function is also included (I really hope to add a new option, such as fuzzycollect), I probably can think of two more functions that need to be implemented now:
1. When the completion key is pressed (like <C-n>), fuzzy filtering needs to be performed while fuzzy collection is performed. If it is still the example in the above picture, the effect is similar to the following (Currently, As a demonstration, I achieve this by pressing h then <C-n><C-p>, plus another letter and then pressing the <BackSpace> key to enable fuzzy filtering to work): Otherwise, let's put this in the first word, then press h and <C-n>, the candidate list as follows (The default order of fuzzy collection is consistent with the order in which the words appear):
2. If the longest option is included, the effect of the noinsert option should be implemented by default. At this time, only the candidate list is fuzzy filtered without insertion, because as you said, "the leader input may be scattered in different positions of the string", the longest match cannot be accurately defined at this time, so we need to achieve the effect of noinsert option.
3. Of course, issue #15296 may still need to be solved.

Finally, thank you very much for your support of fuzzy option. At least when there is no patch 9.1.0598, everything meets my needs perfectly. I don't know what other people think. I also want to hear more ideas from more people.

Jul 19 '24 11:07 roccomao

My 2c, I would like to have a possibility of fuzzy collecting + fuzzy filtering as it works now. This is also consistent with completion of commands when wildoptions has fuzzy, here I typed :es<tab>:

Jul 19 '24 11:07 habamax

quick respond to the order problem in @roccomao second point mentioned in the picture. There is no score sorting here. Because as long as the first matching result is found, it will be inserted. If the score is sorted, it will be a bit troublesome, so I gave up for the time being.

Since there is a lot of content, it may take me some time to understand and respond. I roughly understand the idea here. But it is difficult to satisfy everyone T.T Any suggestions are appreciated. Thanks for everyone :) I will still prioritize fixing existing bugs to avoid any hangs that cause vim to become unusable 😄

Jul 19 '24 11:07 glepnir

@habamax I certainly agree that fuzzy collection and fuzzy filtering can work together, but it might be better to split the options. First of all, the fuzzy option of wildoptions doesn't seem to support fuzzy filtering. You can try enter ec<Tab> and then press g, it won't help us choose echomsg or others. Secondly, it doesn't contain the longest option, so there is no conflict. So here I suggest splitting the functions for fuzzy option, which will have more control space to adapt to various needs.

Jul 19 '24 12:07 roccomao

@glepnir Sorry, because I don't know your internal implementation, I simply thought that since there are collection and filtering, I naturally thought that h should be used for collection first, and then h should be used for filtering again by default. Because it will also perform filtering when we enter other letters, even if we delete this letters at this time, and only the letter h is left, the candidate list is also a fuzzy filter for h.

In short, thank you for your work, and I look forward to your subsequent implementation. FUZZY is a very great feature 👍

Jul 19 '24 12:07 roccomao

quick idea that came to my mind was fuzzy:matchseq mentioned in the previous PR. Set matchseq to ensure the length of the string to match. fuzzy:3 type cmpd sue cmp to get result and use d for fuzzy filter? not sure need more thinking

Jul 19 '24 12:07 glepnir

Some thoughts after trying to play with fuzzy matching a little. To be honest as mostly an end user (since I only casually observed the previous commits) I'm kind of confused using the current implementation (hence why I am making this post).

Longest / fuzzy-collect behavior: If the above example (xyz and xzy, completing starting from y), using longest should end up having the completion being x only. Note that the whole point of "longest" is to insert the "longest common text of the matches" per the documentation. The "common text" part is really the main reason for this option existing to begin with (just think about how you use the option in day-to-day editing before fuzzy was implemented). We aren't trying to literally insert any longest fuzzy match. In this case, x is the longest initial substring and that should be the one that's completed, even if you entered "y". Also, this is also how it would work if you write a custom completefunc that returns ['xyz', 'xzy'] and then do <C-X><C-U> on y. Consistency is really important here. I think the current behavior of completing xzy is wrong as it's probably not what the user wants and also inconsistent with how other completion works.

FWIW, I think intuitively and semantically, longest should always use non-fuzzy collect. The main reason, as I mentioned, is to get the longest common substring. Using it with fuzzy collect is a little odd, but if we do use fuzzy collect, then I think the behavior should be what I suggest here.

So basically, step 1:

Step 2 (this was simulated by writing a completefunc and then <C-X><C-U> as I mentioned above):
tags vs keyword completion: It's kind of confusing that right now completion of tags i_Ctrl-X_Ctrl-] doesn't use fuzzy collect, but keyword completion i_Ctrl-P does use it. That seems kind of inconsistent which to me is worse. Is there a reason it's different? Can we fix it?
Fuzzy collect+filter vs fuzzy filter only: Stepping back I think the core issue right now is we keep discussing fuzzy collect and fuzzy filtering but the actual issue is that this "collect vs filter" concept is a somewhat new one that we are introducing to Vim. It's really more an implementation detail that we are now trying to expose to the user. Previously there was never any difference between the two, except when implementing custom completion functions (e.g. completefunc) but even those are usually supposed to be written in a way that doesn't require the user thinking too much about this distinction.

For one, Vim does not make it clear which part of the substring is the initial collected string and which one is the filtered one. Let's say if I try to complete a tag (from the Vim development tags file) by doing ASS<C-X><C-]>EQ with completeopt=menu,noinsert,fuzzy (remember, tags are using non-fuzzy collect right now), this is what I see:

I think it's pretty hard for a user to visually tell that the "ASS" part is the initially collected string, and the "EQ" part is used for the additional fuzzy filtering. Furthermore, if you backspace enough to delete one of the "S" so you have "AS", and then you type "S" to get "ASS" again, now only "AS" is the collected string, the the third "S" is part of the filter. I think that's quite confusing tbh and would lead to people filing issues complaining that the behavior is non-intuitive.

I think it's ok to add an option to separate out fuzzy filtering only vs fuzzy collect, but I think it's important to think about the API / documentation / configuration that we are exposing to the user, instead of treating it as an afterthought. Most users don't know what features they are missing (unless you follow Vim dev intensely) but they definitely notice inconsistency and confusing features that behave oddly. In particularly I think it would be a good idea to find a better way to let the user know which part is the collected part (the "ASS" in my example) and which part is the filtered part (the "EQ"). I also feel like right now there are a fair bit of inconsistency and confusing bits because the semantics isn't clear and differs depending on what you do.

Oct 18 '24 06:10 ychin

Yes, there are some unclear issues at present. This is what I have been thinking about recently. First, longest should be ignored when fuzzy exists. Second, what does fuzzy mean here? Fuzzy in completeopt only means fuzzy filtering based on the completion list that has been used. It is right to clearly distinguish between collection and filtering. For some built-in completions, fuzzy collection should be used. There are many kinds of built-in completion modes and they are based on different implementations. It is difficult to break these behaviors. For example, keyword completion is generated by searchit and compl_pattern. We cannot generate fuzzy matches based on regular expressions. My idea is to add new options to complete option for fuzzy collection, such as set complete+=fc with .wb. However, there is no option to control file completion, tag completion, line completion, etc. If fuzzy collection is implemented on these modes, a new option may be needed. then use it like completefuzzycollect=k,f,w, keyword file whole-line control the builtin completion mode which use fuzzy collect. The plan is to implement fuzzy collection one by one. Avoid a lot of wrong behaviors that cause multiple modes to not work and bring bad experience. When I have a new draft, I will submit a PR. I hope you can give me more suggestions there.

thanks @ychin

Oct 18 '24 07:10 glepnir

First, longest should be ignored when fuzzy exists

Assuming we have fuzzy collect vs filter, then why should longest be ignored when only using fuzzy filter? The initial collection is not using fuzzy and longest could still be work just like before.

Even if we have fuzzy collect, the more I think about it the more I think we could still support it, by just using the longest substring (from the collected results) as I mentioned. For example, in the top example here, we would just complete to x. This seems reasonable to me and it's actually how user completion functions work now (e.g. completefunc). The user can always remove longest from the options if they don't like it.

Just to give more examples to be clear, so with fuzzy collect, let's say we have this:

foobar foobez footrest abc

If I have completeopt=fuzzy,fuzzy-collect,longest,menu (I invented a fuzzy-collect option here), and type fr then do completion, I would imagine it should complete to foo since there are two results ("foobar" and "footrest").

You can see this demonstrated in today's Vim if you use a completion function (set with completefunc option) like so:

function! FuzzyCompletion(findstart, base)
    if a:findstart
        return 0
    endif
    return matchfuzzy(['foobar', 'foobez', 'footrest', 'abc'], a:base)
endfunction

Which shows:

Note that the second screenshot is a little odd because it's highlighting "foo" in the popup which is actually kind of wrong as it should highlight the "f" and "r" in each string.

As I mentioned we should try to keep things as consistent and reasonable as possible. I think if the user specifies longest, then it would be nice to have a reasonable implementation instead of just blatantly not working. Right now, nocomplete does not work with longest but that is only because these two options fundamentally don't work together semantically. I'm not sure if longest is in that camp with fuzzy collect.

Either way I think we should try to make user completion functions work the same regarding longest,fuzzy (since it currently supports it). We should have a single consistent model as much as possible.

Second, what does fuzzy mean here? Fuzzy in completeopt only means fuzzy filtering based on the completion list that has been used. It is right to clearly distinguish between collection and filtering.

My point is that this may be clear to someone who spends a lot of time thinking about / working on Vim, I don't think this is immediately intuitive to an end user, because this is not generally how completions work and they just see a popup. If there are real reasons why we want to have non-fuzzy collection, but fuzzy filtering, then that is a new mode of thinking that we are imposing on our users, and per the screenshot I posted in my above comment, it's not really clear to the user which substring is the "collected" part, and which part is the "filtered" part. If you backspace and type the same character again, now the behavior also changes because the last character is now used for filtering instead of collecting.

I'm just trying to make sure we think through this a little more to make sure the options we expose and the behaviors make sense. For example, ideally, the completion UI (including the popup menu) can do more to show the user which parts is the originally collected part and which is the new substring, but this could be a little too busy and impose more requirements on colorschemes.

Oct 18 '24 20:10 ychin

Assuming we have fuzzy collect vs filter, then why should longest be ignored when only using fuzzy filter?

it's a little unclear. I agree with all of your distinction. What I'm talking about here is the current fuzzy behavior. :)

completeopt=fuzzy,fuzzy-collect,longest,menu

This doesn't make sense. I mentioned it in the comment above. I would prefer that we have a new option to control those built-in completion modes to enable fuzzy collection. We have keyword filenames tags wholeline etc completion mode. this way we can handle them one by one. If a pattern cannot be fuzzy-ed, we can skip it. In this way, any incorrect implementation will not cause too many bugs, at least we can temporarily restore it to its original state.

Oct 19 '24 05:10 glepnir

completeopt=fuzzy,fuzzy-collect,longest,menu This doesn't make sense

Sure, that was just an example to show that we are talking about using fuzzy collect. My core point was more that fuzzy collect can still support the longest option and make sense, by following my example that I listed afterwards (since you mentioned longest would just be disabled if fuzzy is used). In that section I wasn't particularly discussing how fuzzy collect itself should be configured and whatnot.

So to summarize, my points in the first section was that::

fuzzy filter only AND longest should work just fine, which I think you agreed with.
fuzzy collect AND longest should also work. I think you disagreed with that point and I was trying to make my case there. In particularly, it should collect up to the longest substring. How we configure fuzzy collect is another discussion.

Oct 21 '24 19:10 ychin

I don't actually have a strong opinion here. It's just that supporting longest requires some extra work. Imagine that the leader is scattered at every position of the item. If supporting longest requires an additional fast algorithm to find out if there is a longest string that exists right? I'm worried about the efficiency here. do you have any better ideas?

Oct 22 '24 05:10 glepnir

You probably know the code better tbh regarding the performance and implementation time required. I just thought maybe we could reuse the way that longest is already working with custom completion functions like completefunc but I don't know if there are further edge cases to resolve (since the current way custom completion functions kind of feel like they work by accident when longest is turned on). But I will admit I haven't taken close look into the fuzzy implementations enough.

But sure, implement the minimum to get things working first and then add features is better than having broken things that you have to fix later. So it might make sense to just disable longest at least at first as you suggested if just to get to parity.

Oct 22 '24 08:10 ychin

I'm thinking of a slightly simpler way. based on the score obtained by fuzzy, the score in a certain highest range must be close to what the user wants. Search for Long common sub in this range if have then insert it. then our time complexity will be reduced. maybe this is a way ?

Oct 22 '24 08:10 glepnir

Maybe I'm not understanding the performance complexity here. After you have collected a list of strings using fuzzy-collect, you now have a list of random strings right? Finding the common substring just requires looping through each character in each string once, so it should just be O(n)? The moment you see something that's not common with the other choices you stop searching. Wouldn't it be more work to do all the score calculations and whatnot?

Oct 22 '24 09:10 ychin

idea is once we have all the candidates for fuzzy matching, then can use a linear scan algorithm to find the longest common prefix of these candidates. The algorithm flow is as follows:

Initialization: Take the first fuzzy match as the initial longest common prefix (the one with the highest score)
Iterative comparison: For each subsequent match, start comparing from the first character in turn, and update the length of the longest common prefix until the shortest common prefix is found. Termination condition: If the length of the common prefix is 0, it means there is no common prefix, and you can stop comparing directly. This is inefficient or stop comparing when the score is less than a certain threshold, which will reduce the complexity.
Implementation details The search for the longest common prefix is linear time, that is, 𝑂(𝑛⋅𝑘) where k is the length of the longest common prefix.

totally is O(fuzzy spent + n⋅k)

Oct 22 '24 09:10 glepnir