julia-vim icon indicating copy to clipboard operation
julia-vim copied to clipboard

More unicode characters in identifiers

Open c42f opened this issue 4 years ago • 5 comments

julia-vim considers julia identifiers like Σ₁ to be two separate words such that searching for this with * positioned over the identifier results in searching for only the current character (also other word-based operations such as w).

It would be great if julia-vim would support a larger set of julia identifier characters as words. The full rules are somewhat complex (see https://github.com/JuliaLang/Tokenize.jl/blob/bb4e28e06c1596da956a81969562c242ebc9b3bb/src/utilities.jl#L111) but maybe a good approximation would be possible?

c42f avatar Aug 01 '19 06:08 c42f

Yes it would be great but unfortunately the option controlling this is iskeyword whose format only allows to specify ASCII characters.

carlobaldassi avatar Aug 01 '19 07:08 carlobaldassi

Wow, it seems this really just isn't configurable. I had a look around and the best I could find was https://github.com/vim/vim/issues/576 which has had precious little attention.

Seems like an issue for upstream, alas.

c42f avatar Aug 01 '19 08:08 c42f

Here is the offending function:

https://github.com/vim/vim/blob/f2d8b7a0a69fd71018341755da5ce55d067b5923/src/charset.c#L849

So the unicode character class is used unconditionally. We'd need some data structure to replace the 32 byte bitset which GET_CHARTAB uses, but which can efficiently cover the set of unicode code points.

c42f avatar Aug 02 '19 21:08 c42f

Looking further into this issue, it seems vim may just be ignoring the "modifier letter" unicode category https://www.compart.com/en/unicode/category/Lm and "other number" category https://www.compart.com/en/unicode/category/No, and not considering these correctly when determining which characters are "word-like" characters.

The unicode consortium has published a technical report Unicode Identifier and Pattern Syntax discussing identifiers which clearly states that "modifier letters" should be considered part of identifiers. So perhaps vim itself could just be changed to do the desired thing here by default.

c42f avatar Oct 10 '19 02:10 c42f

Well, I dug a bit into what unicode really has to say about this (rather a lot, in fact!) and submitted https://github.com/vim/vim/issues/5038.

c42f avatar Oct 10 '19 04:10 c42f