ctrlp-py-matcher Tag search in ctrlp-py-matcher

@FelikZ ,

thanks for creating this plugin.. it really sped up the file/MRU search.

However, when I tried to search tags, it didn't really show any tags.

I added the following line:

let g:ctrlp_match_func = { 'match': 'pymatcher#PyMatch' }

to my .vimrc, and it led to fast fuzzy searching for files. However, when I tried to search tags using :CtrlPTag, with a tag Database from one of my files, it gave no results.

Then I commented out let g:ctrlp_match_func = { 'match': 'pymatcher#PyMatch' }, and did :CtrlPTag with a tag Database from one of my files. This time, it was slow to find that tag, but it did fine the tag.

Is there any additional settings I should use to enable tag search using this plugin?

Jun 11 '18 13:06 alphaCTzo7G

Hi @alphaCTzo7G , I am not 100% if it's possible to fix that by configuration change but I didn't used CtrlPTag with this plugin at all. I think it might require an update to support this. Currently I am not using vim anymore for a long time, so I will not promise you to implement this. Sorry!

Jun 21 '18 14:06 FelikZ

@FelikZ .. thanks for your help..

Actually I found that :CtrlPTag does work for most repos using this plugin, and it vastly speeds up search..

For some reason, if I have a certain set of characteristics of my repo:

python repo
has a large env folder
large code base

CtrlPTag stops working.. Otherwise it works: https://github.com/ctrlpvim/ctrlp.vim/issues/442#issuecomment-396689275

I will try to figure out why this is so..

Jun 22 '18 16:06 alphaCTzo7G

In general CtrlPTag works well.. but for certain libraries it fails.

I was able to create a small test.. Seems like the problem happens only when there are certain libraries like jinja2 in the virtual env, but doesn't happen for all libraries..

I created a small test repo here: https://github.com/alphaCTzo7G/test to reproduce the problem..

Jul 02 '18 01:07 alphaCTzo7G

@FelikZ , I figured out the root cause.. It because vim.eval crashes when the a:items contains BOM fields.

It seems that @ludovicchabant also faced the same issue and had to modify https://github.com/ludovicchabant/vim-gutentags to handle the issue: https://ludovic.chabant.com/devblog/2017/02/25/aaa-gamedev-with-vim/

His modification of ctrlp-py-matcher is here: https://github.com/ludovicchabant/ctrlp-py-matcher/blob/2f6947480203b734b069e5d9f69ba440db6b4698/autoload/pymatcher.py#L22, but it doesn't solve the issue.. just tells you want happened.

Its not possible to get any tag search.. so I am wondering if you know of a replacement of vim.eval which may be able to accept variables/string which could contain BOM fields.

From this line, it seems that VIM is passing the items to pymatcher#PyMatch.

I wonder if you know of any other way to evaluate a variable in the vim namespace and transfer it to the python namespace other than going through the vim.eval route?

Jul 15 '18 00:07 alphaCTzo7G

@alphaCTzo7G

if you know of a replacement of vim.eval

None that I know. The only thing that comes to the mind, maybe its possible to convert BOM to non-BOM before eval takes place?

Jul 18 '18 13:07 FelikZ

@FelikZ.. actually I was wrong.. it turns out that different files had different problems _identifier.py had valid utf-8 characters.. but ctags would take _identifier.py and cut certain string incorrectlyhttps://github.com/universal-ctags/ctags/issues/1805

@b4n has rectified this issue in this PR: https://github.com/universal-ctags/ctags/pull/1807

There are other files which break misc.html but misc.html has a invalid utf-8 character, which ctags seems to insert as is. vim.eval seems to break because of the invalid utf-8 character`.

Perhaps there should be a warning that because of this limitation, users should use the PR from here: https://github.com/universal-ctags/ctags/pull/1807, otherwise vim.eval might break.

Excuberant-ctags upon which universal-ctags is built, actually works fine. So if you use sudo apt-get install ctags, the ctags executable that is installed can handle this particular issue fine.. but may have other side effects or missing features. I will update this as I find out more about this.

Jul 30 '18 16:07 alphaCTzo7G

The problem with misc.html can be avoided if there is a function in VIML which can parse the string in a:items and remove any invalid utf-8 character before passing it to python in vim.eval. Do you know if such a function exists?

Jul 30 '18 16:07 alphaCTzo7G

@alphaCTzo7G misc.html is encoded in windows-1252, which obviously isn't UTF-8. For various reasons uctags doesn't translate any encodings (there are long topics you can find about that on uctags issue tracker, but the main reason is that it's incredibly hard to do right -- and there are some cases where there is no good solution), and sadly one of the possible issues arising is having an output tag file with mixed encodings when processing multiple files. ectags never did anything with encodings either (actually, way less), and the difference you see are improper truncation in uctags (whereas ectags didn't truncate at all, and you can get this behavior in uctags with --pattern-length-limit=0) which is what I fixed in the linked PR. In the case of misc.html it's likely that uctags is simply outputting more tags than ectags did. Anyway, be it ectags or uctags, they are likely to output non-UTF-8 tags if the source files are not already UTF-8. This is even becoming more and more prominent with languages allowing Unicode identifiers, and uctags parsers supporting those.

However, after all me saying uctags doesn't handle encoding conversion, it actually partly does if you tell it to though the --input-encoding (and/or --input-encoding-<LANG>) and --output-encoding options. This support is not perfect yet (I didn't follow this recently, but it used to be limited), and it requires the caller to decide on encodings, but it might help in some circumstances.

At any rate, you should probably find a way for invalid UTF-8 not to break your software, because it's unfortunately unlikely to be "fixed" anytime soon, and it's a fairly common thing (in most cases it'll be because the source was ISO 8859-something or Windows-something or alike, and on rare occasions the source will actually be utter garbage with mixed things and random bytes here and there -- been there, seen that).

Jul 30 '18 20:07 b4n

Appreciate your advice.. the try.except that @ludovicchabant implemented might be helpful.. and I will try to see if vim.eval can be modified or if there is a function in vim that replace illegal characters before passing to vim.eval.

Jul 31 '18 00:07 alphaCTzo7G

@alphaCTzo7G I found a similar situation when work on a specific project. And I finally found that it is because the fileencoding of tags file is not utf-8 for some reason. You can type :set fileencoding to see. You just need to type :set fileencoding=utf-8 and save then everything will be fine ! By the way, I create a fork since this repository doesn't update anymore. And I add some new feature on it . You can have a try my fork

Dec 13 '20 02:12 cheng3100

ctrlp-py-matcher ctrlp-py-matcher copied to clipboard

Tag search in ctrlp-py-matcher

ctrlp-py-matcher
ctrlp-py-matcher copied to clipboard