swiper
swiper copied to clipboard
`counsel-git-grep` appears to be slow
Today, I tried counsel-git-grep on one of my repos, which is roughly 12 KLoC and was killed by a simple search pattern (which must produce a lot of matches):
- command-execute 17395 95%
- call-interactively 17395 95%
- funcall-interactively 17395 95%
- evil-ex 17395 95%
- evil-ex-execute 17004 93%
- eval 17004 93%
- evil-ex-call-command 17004 93%
- call-interactively 17004 93%
- funcall-interactively 17004 93%
- counsel-git-grep 16999 93%
- ivy-read 16999 93%
- read-from-minibuffer 16956 93%
- ivy--queue-exhibit 16773 92%
- ivy--exhibit 16773 92%
- ivy--filter 16746 92%
- counsel-git-grep-matcher 15648 86%
- cl-remove-if-not 15648 86%
- apply 15648 86%
- cl-remove 15644 86%
- apply 15634 86%
- cl-delete 15595 85% <- WTF?
#<compiled 0x287ded5> 15551 85%
#<compiled 0x3800f05> 3 0%
- ivy--sort 1069 5%
- ivy--flx-sort 1062 5%
- mapcar 956 5%
+ #<compiled 0x2f73e0d> 956 5%
- cl-sort 76 0%
- sort 57 0%
#<compiled 0x19c9e05> 26 0%
sort 16 0%
+ ivy--recompute-index 29 0%
+ ivy--format 16 0%
+ ivy--insert-minibuffer 11 0%
+ #<compiled 0x1f696a9> 106 0%
+ redisplay_internal (C function) 10 0%
+ minibuffer-inactive-mode 9 0%
+ command-execute 7 0%
+ timer-event-handler 3 0%
+ fci-redraw-frame 3 0%
+ ivy--reset-state 39 0%
+ profiler-report 4 0%
profiler-start 1 0%
+ read-from-minibuffer 367 2%
+ timer-event-handler 557 3%
+ ... 205 1%
cl-delete appears to be the bottleneck.
Can you reproduce this from emacs -Q or make plain? What is your full M-xemacs-versionRET?
I found out that fuzzy builder
ivy-re-builders-alist '(...
(t . ivy--regex-fuzzy))
is the ultimate cause for this. Switching it to ivy--regex-plus resolves the problem. Do you have the same experience? Any fancy tweaks to make fuzzy matching faster?
Do you have the same experience?
I'm not able to check right now, but I wouldn't be surprised if that's the case. As I mentioned in https://github.com/abo-abo/swiper/issues/1801#issuecomment-440860883, non-default configurations of ivy-re-builders-alist are known to work less smoothly given lower interest in them. If you search the issue tracker for ivy--regex-fuzzy you should find several related issues.
The fuzzy macher chokes when the collection contains a candidate that's a particularly long string (like several hundreds or thousand of chars). See #1749.
The fix would likely involve some trade-off, like ignoring too long strings, or some parts of them.
I think the trade-off of using ivy--regex-plus is preferrable here.
Suggestions welcome.
Hi @abo-abo, I already saw #1749 and others as well. I also found hlissner/doom-emacs#774, which concludes the story by dropping ivy altogether in favor of helm as its fuzzy filtering does not choke. This is not the first thread that ends with this conclusion, I saw some other on reddit as well. Consequently, I would like to ask a potentially dumb question. Could it be that the way ivy uses flx is inefficient? In the end there could be only three explanations, either helm uses some in-house fuzzy matching which is superior to flx or flx itself is good enough but the way ivy utilizes it in its filtering logic is inefficient or helm already uses the aforementioned trade-offs. In either case taking a glance at helm and reviewing ivy's utilization of flx would not harm. I also found prescient.el by @raxod502 yesterday as an alternative to flx.
And frankly speaking, fzf has already proven for a long time to be superior fuzzy matcher. It can either be used in terminal, i.e. you simply pipe any output to it and it will perform fuzzy filtering on top of it which you can then select interactively. Or there is already a Vim plugin which bridges it and offers fuzzy filtering for listing files, buffers, tags, git logs, and many others inside Vim. I'm aware of counsel-fzf but this only offers fuzzy file search. I think we need to instead create an fzf-based filtering back end which could be fed from any data source (grep, rg, ag, find-file, swtich-to-buffer, etc.) in the same way flx-based filtering is applied now. This should also solve the problem of long strings forever as it will no longer be Emacs responsibility to deal with it. See How FZF and ripgrep improved my workflow for a few highlights.
My understanding is that Helm implements its own fuzzy-matching algorithms and does not use flx, at least by default. I do not use Helm, however, so I could be wrong about this.
@Alexander-Shukaev I understand that it can be frustrating for Ivy users to deal with flx sluggishness.
But it's also frustrating for me as the package author that people use Ivy with in flx mode, and then complain that it works poorly. I have always taken care of the default matcher, and it's been working fast for of the 4 years that Ivy has been out. I don't want to say that the flx matcher comes "AS IS", but when people report issues related to flx performance, I would appreciate as much help as possible:
- If you can add a patch that improves performance, great.
- If you can produce a use-case that compares flx performance of Ivy and another package like Helm, with full data on platform, Emacs version, minimal config with
emacs -Q, relevant installed programs, sample code repository or text file if it involves grepping, benchmark results etc, good. - If you only say: "ivy and flx is not fast for for my repository of 400 files, but helm is fast", that's not really enough info for me to debug and solve the problem.
Regarding fzf:
- The post that you link mentions fzf for finding files, which is
counsel-fzf. - For grepping, the post uses the space for wildcard character, which is
counsel-rgexactly. Addingfzfhere would not result in any improvement. ivy-switch-to-bufferis already performant enough.
If you have an idea of using fzf that offers a more concrete improvement, please open a new issue.
I am using ivy fuzzy matching everywhere, except swiper, because of it super slow on big files and you should forget about swiper-all with it. For the same reason I am using counsel-fzf for files (works perfect, especially with properly set fd). However, I am a little confuse how would you like to use fuzzy matching on reg-exp search? If you provide a literal pattern like this word, it is possible (though I am not sure that this is helpful), but if you put a pattern like \bthis word\b, I think there is no fuzzy matching to be done here, right? I am always using cousel-rg, I have never needed fuzzy matching here and I think I've never got any fuzzy matched results even on literal patters. I have never had problems with counsel-rg.
Apart from this I agree that flx is not impressive. As a quick solution I think whenever fuzzy matching is needed it is possible to implement it through fzf invocation. Like if you'd like to have perform fuzzy matching on the buffer list, you can pipe this list to fzf with the input pattern as a parameter (similar to how it is done for counsel-fzf). Currently in counsel-fzf is invoked through shell, like bash first and then bash invokes fzf, that could be speed up by removing bash and directly calling fzf.
Currently in
counsel-fzfis invoked through shell, likebashfirst and thenbashinvokesfzf, that could be speed up by removingbashand directly callingfzf.
I've implemented your suggestion. Please test.
but if you put a pattern like
\bthis word\b, I think there is no fuzzy matching to be done here, right
That's right, fzf does not support \b: https://github.com/junegunn/fzf/issues/534.
@ChoppinBlockParty,
For the same reason I am using counsel-fzf for files (works perfect, especially with properly set fd).
rg is even faster than fd for this (due to parallel iterator I assume).
However, I am a little confuse how would you like to use fuzzy matching on reg-exp search? If you provide a literal pattern like this word, it is possible (though I am not sure that this is helpful), but if you put a pattern like \bthis word\b, I think there is no fuzzy matching to be done here, right? I am always using cousel-rg, I have never needed fuzzy matching here and I think I've never got any fuzzy matched results even on literal patterns. I have never had problems with counsel-rg.
In fact, counsel-ag or counsel-rg work by replacing each space with a wildcard .* (@abo-abo also mentioned it above). You can try to type two words and see how they are matched (aka ivy--regex-plus).
@abo-abo,
Now here's where I'm getting at. In #1801, counsel-ag or counsel-rg were already taught to respect ivy--regex-fuzzy by passing each of the characters in the search input separated by a wildcard .*, what will result in fuzzy-like output (it's exactly the definition of ivy--regex-fuzzy). So now it makes total sense to do the same implementation for counsel-git-grep and any other counsel-* grep-like function as this will offload the regular expression engine efficiency problem (e.g. #1749) to a specialized tool.
However, as confirmed in #1801 and also from my own experience bare fuzzy matching is far from enough, there will simply be too many results and especially false matches which you don't even want/expected to see in the output. For instance, I disabled fuzzy matching for file/buffer searches long ago exactly because of that as I realized that it does not score well with these types of searches, while I still had it on for grep-like searches as it seemed to perform best in this domain. That's where flx or some other potentially more efficient scoring/ranking/sorting tools come into play. It should be applied on top of the output of any search tool whenever that search tool was invoked with ivy--regex-fuzzy mode (e.g. that's missing piece for #1801 and will also be missing if counsel-git-grep follows changes done for #1801).
[I know that this is what you're already doing for e.g. find-file or switch-to-buffer, with Emacs built-in regular expression engine though (i.e. still subject to #1749 but not as relevant since having incredibly long file paths and buffers names is quite far-fetched).]
When this is done, say even with flx for the time being, it would be interesting to see whether there a performance/usability improvement compared to the previous implementation. In the end of the day, that scoring algo is not that tricky, what is important is a choice of collection for high-performance sorting and harnessing garbage collector.