fzf.vim Consistent Tags/BTags with query

The BTags command is declared with optional [QUERY] while the Tags command is declared with optional [PREFIX]. The query behavior in BTags source generator is less restrictive compared to Tags whose source is generated using the readtags external command, where tags are only listed if PREFIX is an exact match. This commit relaxes the exact prefix match when readtags is used to allow for a fuzzy prefix similar to the source generator of BTags. The fzf fuzzy score ranking in the s:tags_sink will still rank the best fuzzy match on top, same as if the match was exact like in the current implementation.

The changes in this commit are useful for tags that contain some scope prefixes like the functions s:tags_sink or fzf#vim#tags in vim-script code, in combination with a key mapping of the form

nnoremap <silent> <leader>l :execute "Tags '" . expand('<cword>')<CR>
nnoremap <silent> <leader>bl :execute "BTags '" . expand('<cword>')<CR>

If the cursor is placed on the word "tags" in this example fzf#vim#tags, the relaxed Tags query call will list the tags entry if <cword> expands to "tags", while the current implementation will not. Inconsistently, the BTags call will list the tags entry with its current implementation.

Sep 28 '24 15:09 fab4100

The reason we only allow a prefix is to make readtags perform fast binary search over huge tags files. I've tested your patch, but the performance difference is quite noticeable for a tags file of hundreds of MBs.

See https://github.com/junegunn/fzf.vim/issues/1524

Sep 29 '24 10:09 junegunn

Thanks for the referenced issue, I missed that somehow. It would be nice to have a homogeneous interface between Tags and BTags while still get the benefit of readtags. My proposal above is actually not homogeneous since the query would need to be sanitized first to get a POSIX compatible extended regex that is passed to readtags for pre-filter and actual user query should be passed as initial prompt to fzf call for post-filter. I will do some more benchmarking to see if something like that is possible using readtags as a pre-filter.

Sep 29 '24 11:09 fab4100

It would be nice to have a homogeneous interface between Tags and BTags while still get the benefit of readtags.

I don't think it's possible. You can't avoid scanning through the whole list for non-prefix queries.

If performance is not a concern for you, you can redefine Tags command like so in your configuration file.

command! -bang -nargs=* Tags call fzf#vim#tags('', fzf#vim#with_preview({ "options": ['--query', <q-args>], "placeholder": "--tag {2}:{-1}:{3..}" }), <bang>0)

Original definition:

https://github.com/junegunn/fzf.vim/blob/c5ce7908ee86af7d4090d2007086444afb6ec1c9/plugin/fzf.vim#L65

Sep 29 '24 12:09 junegunn

I believe you are correct. Using readtags with a regex as pre-filter is not viable from performance point of view. For people who use rg already and are willing to sacrifice some performance, it may be an option to use as the source filter in fzf#vim#tags(query, ...).

Here are a few test results (the first test result corresponds to the implementation I use in the commit above):

Linux kernel tags file:

du tags = 1.2G  tags

query = 'munmap'

GOLD: readtags -t tags -e -p - "${query}"
real    0m0.001s
user    0m0.000s
sys     0m0.001s
Matches:  6

TEST: readtags -t tags -e -Q "(#/^[^[:space:]]*${query}\$/ \$name)" -l
real    0m9.632s
user    0m9.554s
sys     0m0.070s
Matches:  44

TEST: readtags -t tags -e -Q "(#/^${query}/ \$name)" -l
real    0m4.300s
user    0m4.238s
sys     0m0.060s
Matches:  6

TEST: rg ^${query} tags
real    0m0.218s
user    0m0.162s
sys     0m0.056s
Matches:  7

TEST: rg ${query} tags
real    0m0.215s
user    0m0.158s
sys     0m0.056s
Matches:  85

TEST: grep ^${query} tags
real    0m0.440s
user    0m0.360s
sys     0m0.080s
Matches:  6

Some other large project (tags file one order of magnitude smaller than Linux kernel):

du tags = 82M   tags

query = 'BulkData'

GOLD: readtags -t tags -e -p - "${query}"
real    0m0.001s
user    0m0.001s
sys     0m0.000s
Matches:  31

TEST: readtags -t tags -e -Q "(#/^[^[:space:]]*${query}\$/ \$name)" -l
real    0m0.417s
user    0m0.414s
sys     0m0.003s
Matches:  49

TEST: readtags -t tags -e -Q "(#/^${query}/ \$name)" -l
real    0m0.289s
user    0m0.289s
sys     0m0.000s
Matches:  31

TEST: rg ^${query} tags
real    0m0.012s
user    0m0.006s
sys     0m0.006s
Matches:  31

TEST: rg ${query} tags
real    0m0.012s
user    0m0.000s
sys     0m0.012s
Matches:  4112

TEST: grep ^${query} tags
real    0m0.046s
user    0m0.046s
sys     0m0.000s
Matches:  31

If I understand your suggestion w/r/t redefining the Tags command, it would result in calling tags.pl with an empty query. For a tags file of the size of Linux kernel, this would result in:

TEST: perl ../bin/tags.pl "" tags | rg ^${query}
real    0m8.072s
user    0m7.957s
sys     0m0.659s
Matches:  7

whereas using rg as a source filter in fzf#vim#tags(query, ...) would result in something in the order of

TEST: rg ^${query} tags
real    0m0.218s
user    0m0.162s
sys     0m0.056s
Matches:  7

which I could consider as performance trade-off at the benefit of homogeneous Tags/BTags interface (but not really the perl solution above).

From your perspective, is it possible to support a custom 'source' filter in fzf#vim#tags(query, ...) to avoid copy-pasting fzf#vim#tags(query, ...) to my personal config just to change that line?

Sep 29 '24 13:09 fab4100