swiper `counsel-git-grep` appears to be slow

Today, I tried counsel-git-grep on one of my repos, which is roughly 12 KLoC and was killed by a simple search pattern (which must produce a lot of matches):

- command-execute                                               17395  95%
 - call-interactively                                           17395  95%
  - funcall-interactively                                       17395  95%
   - evil-ex                                                    17395  95%
    - evil-ex-execute                                           17004  93%
     - eval                                                     17004  93%
      - evil-ex-call-command                                    17004  93%
       - call-interactively                                     17004  93%
        - funcall-interactively                                 17004  93%
         - counsel-git-grep                                     16999  93%
          - ivy-read                                            16999  93%
           - read-from-minibuffer                               16956  93%
            - ivy--queue-exhibit                                16773  92%
             - ivy--exhibit                                     16773  92%
              - ivy--filter                                     16746  92%
               - counsel-git-grep-matcher                       15648  86%
                - cl-remove-if-not                              15648  86%
                 - apply                                        15648  86%
                  - cl-remove                                   15644  86%
                   - apply                                      15634  86%
                    - cl-delete                                 15595  85% <- WTF?
                       #<compiled 0x287ded5>                    15551  85%
                     #<compiled 0x3800f05>                          3   0%
               - ivy--sort                                       1069   5%
                - ivy--flx-sort                                  1062   5%
                 - mapcar                                         956   5%
                  + #<compiled 0x2f73e0d>                         956   5%
                 - cl-sort                                         76   0%
                  - sort                                           57   0%
                     #<compiled 0x19c9e05>                         26   0%
                   sort                                            16   0%
               + ivy--recompute-index                              29   0%
              + ivy--format                                        16   0%
              + ivy--insert-minibuffer                             11   0%
            + #<compiled 0x1f696a9>                               106   0%
            + redisplay_internal (C function)                      10   0%
            + minibuffer-inactive-mode                              9   0%
            + command-execute                                       7   0%
            + timer-event-handler                                   3   0%
            + fci-redraw-frame                                      3   0%
           + ivy--reset-state                                      39   0%
         + profiler-report                                          4   0%
           profiler-start                                           1   0%
    + read-from-minibuffer                                        367   2%
+ timer-event-handler                                             557   3%
+ ...                                                             205   1%

cl-delete appears to be the bottleneck.

Nov 21 '18 22:11 Alexander-Shukaev

Can you reproduce this from emacs -Q or make plain? What is your full M-xemacs-versionRET?

Nov 21 '18 23:11 basil-conto

I found out that fuzzy builder

    ivy-re-builders-alist    '(...
                               (t . ivy--regex-fuzzy))

is the ultimate cause for this. Switching it to ivy--regex-plus resolves the problem. Do you have the same experience? Any fancy tweaks to make fuzzy matching faster?

Nov 21 '18 23:11 Alexander-Shukaev

Do you have the same experience?

I'm not able to check right now, but I wouldn't be surprised if that's the case. As I mentioned in https://github.com/abo-abo/swiper/issues/1801#issuecomment-440860883, non-default configurations of ivy-re-builders-alist are known to work less smoothly given lower interest in them. If you search the issue tracker for ivy--regex-fuzzy you should find several related issues.

Nov 22 '18 00:11 basil-conto

The fuzzy macher chokes when the collection contains a candidate that's a particularly long string (like several hundreds or thousand of chars). See #1749.

The fix would likely involve some trade-off, like ignoring too long strings, or some parts of them.

I think the trade-off of using ivy--regex-plus is preferrable here.

Suggestions welcome.

Nov 22 '18 09:11 abo-abo

Hi @abo-abo, I already saw #1749 and others as well. I also found hlissner/doom-emacs#774, which concludes the story by dropping ivy altogether in favor of helm as its fuzzy filtering does not choke. This is not the first thread that ends with this conclusion, I saw some other on reddit as well. Consequently, I would like to ask a potentially dumb question. Could it be that the way ivy uses flx is inefficient? In the end there could be only three explanations, either helm uses some in-house fuzzy matching which is superior to flx or flx itself is good enough but the way ivy utilizes it in its filtering logic is inefficient or helm already uses the aforementioned trade-offs. In either case taking a glance at helm and reviewing ivy's utilization of flx would not harm. I also found prescient.el by @raxod502 yesterday as an alternative to flx.

Nov 22 '18 22:11 Alexander-Shukaev

And frankly speaking, fzf has already proven for a long time to be superior fuzzy matcher. It can either be used in terminal, i.e. you simply pipe any output to it and it will perform fuzzy filtering on top of it which you can then select interactively. Or there is already a Vim plugin which bridges it and offers fuzzy filtering for listing files, buffers, tags, git logs, and many others inside Vim. I'm aware of counsel-fzf but this only offers fuzzy file search. I think we need to instead create an fzf-based filtering back end which could be fed from any data source (grep, rg, ag, find-file, swtich-to-buffer, etc.) in the same way flx-based filtering is applied now. This should also solve the problem of long strings forever as it will no longer be Emacs responsibility to deal with it. See How FZF and ripgrep improved my workflow for a few highlights.

Nov 22 '18 22:11 Alexander-Shukaev

My understanding is that Helm implements its own fuzzy-matching algorithms and does not use flx, at least by default. I do not use Helm, however, so I could be wrong about this.

Nov 23 '18 22:11 raxod502

@Alexander-Shukaev I understand that it can be frustrating for Ivy users to deal with flx sluggishness.

But it's also frustrating for me as the package author that people use Ivy with in flx mode, and then complain that it works poorly. I have always taken care of the default matcher, and it's been working fast for of the 4 years that Ivy has been out. I don't want to say that the flx matcher comes "AS IS", but when people report issues related to flx performance, I would appreciate as much help as possible:

If you can add a patch that improves performance, great.
If you can produce a use-case that compares flx performance of Ivy and another package like Helm, with full data on platform, Emacs version, minimal config with emacs -Q, relevant installed programs, sample code repository or text file if it involves grepping, benchmark results etc, good.
If you only say: "ivy and flx is not fast for for my repository of 400 files, but helm is fast", that's not really enough info for me to debug and solve the problem.

Regarding fzf:

The post that you link mentions fzf for finding files, which is counsel-fzf.
For grepping, the post uses the space for wildcard character, which is counsel-rg exactly. Adding fzf here would not result in any improvement.
ivy-switch-to-buffer is already performant enough.

If you have an idea of using fzf that offers a more concrete improvement, please open a new issue.

Nov 24 '18 16:11 abo-abo

I am using ivy fuzzy matching everywhere, except swiper, because of it super slow on big files and you should forget about swiper-all with it. For the same reason I am using counsel-fzf for files (works perfect, especially with properly set fd). However, I am a little confuse how would you like to use fuzzy matching on reg-exp search? If you provide a literal pattern like this word, it is possible (though I am not sure that this is helpful), but if you put a pattern like \bthis word\b, I think there is no fuzzy matching to be done here, right? I am always using cousel-rg, I have never needed fuzzy matching here and I think I've never got any fuzzy matched results even on literal patters. I have never had problems with counsel-rg.

Apart from this I agree that flx is not impressive. As a quick solution I think whenever fuzzy matching is needed it is possible to implement it through fzf invocation. Like if you'd like to have perform fuzzy matching on the buffer list, you can pipe this list to fzf with the input pattern as a parameter (similar to how it is done for counsel-fzf). Currently in counsel-fzf is invoked through shell, like bash first and then bash invokes fzf, that could be speed up by removing bash and directly calling fzf.

Nov 24 '18 16:11 ChoppinBlockParty

Currently in counsel-fzf is invoked through shell, like bash first and then bash invokes fzf, that could be speed up by removing bash and directly calling fzf.

I've implemented your suggestion. Please test.

Nov 24 '18 17:11 abo-abo

but if you put a pattern like \bthis word\b, I think there is no fuzzy matching to be done here, right

That's right, fzf does not support \b: https://github.com/junegunn/fzf/issues/534.

Nov 24 '18 17:11 abo-abo

@ChoppinBlockParty,

For the same reason I am using counsel-fzf for files (works perfect, especially with properly set fd).

rg is even faster than fd for this (due to parallel iterator I assume).

However, I am a little confuse how would you like to use fuzzy matching on reg-exp search? If you provide a literal pattern like this word, it is possible (though I am not sure that this is helpful), but if you put a pattern like \bthis word\b, I think there is no fuzzy matching to be done here, right? I am always using cousel-rg, I have never needed fuzzy matching here and I think I've never got any fuzzy matched results even on literal patterns. I have never had problems with counsel-rg.

In fact, counsel-ag or counsel-rg work by replacing each space with a wildcard .* (@abo-abo also mentioned it above). You can try to type two words and see how they are matched (aka ivy--regex-plus).

@abo-abo,

Now here's where I'm getting at. In #1801, counsel-ag or counsel-rg were already taught to respect ivy--regex-fuzzy by passing each of the characters in the search input separated by a wildcard .*, what will result in fuzzy-like output (it's exactly the definition of ivy--regex-fuzzy). So now it makes total sense to do the same implementation for counsel-git-grep and any other counsel-* grep-like function as this will offload the regular expression engine efficiency problem (e.g. #1749) to a specialized tool.

However, as confirmed in #1801 and also from my own experience bare fuzzy matching is far from enough, there will simply be too many results and especially false matches which you don't even want/expected to see in the output. For instance, I disabled fuzzy matching for file/buffer searches long ago exactly because of that as I realized that it does not score well with these types of searches, while I still had it on for grep-like searches as it seemed to perform best in this domain. That's where flx or some other potentially more efficient scoring/ranking/sorting tools come into play. It should be applied on top of the output of any search tool whenever that search tool was invoked with ivy--regex-fuzzy mode (e.g. that's missing piece for #1801 and will also be missing if counsel-git-grep follows changes done for #1801). [I know that this is what you're already doing for e.g. find-file or switch-to-buffer, with Emacs built-in regular expression engine though (i.e. still subject to #1749 but not as relevant since having incredibly long file paths and buffers names is quite far-fetched).]

When this is done, say even with flx for the time being, it would be interesting to see whether there a performance/usability improvement compared to the previous implementation. In the end of the day, that scoring algo is not that tricky, what is important is a choice of collection for high-performance sorting and harnessing garbage collector.

Nov 24 '18 23:11 Alexander-Shukaev

swiper swiper copied to clipboard

`counsel-git-grep` appears to be slow

swiper
swiper copied to clipboard