julia-vim
julia-vim copied to clipboard
More unicode characters in identifiers
julia-vim considers julia identifiers like Σ₁
to be two separate words such that searching for this with *
positioned over the identifier results in searching for only the current character (also other word-based operations such as w
).
It would be great if julia-vim would support a larger set of julia identifier characters as words. The full rules are somewhat complex (see https://github.com/JuliaLang/Tokenize.jl/blob/bb4e28e06c1596da956a81969562c242ebc9b3bb/src/utilities.jl#L111) but maybe a good approximation would be possible?
Yes it would be great but unfortunately the option controlling this is iskeyword
whose format only allows to specify ASCII characters.
Wow, it seems this really just isn't configurable. I had a look around and the best I could find was https://github.com/vim/vim/issues/576 which has had precious little attention.
Seems like an issue for upstream, alas.
Here is the offending function:
https://github.com/vim/vim/blob/f2d8b7a0a69fd71018341755da5ce55d067b5923/src/charset.c#L849
So the unicode character class is used unconditionally. We'd need some data structure to replace the 32 byte bitset which GET_CHARTAB
uses, but which can efficiently cover the set of unicode code points.
Looking further into this issue, it seems vim may just be ignoring the "modifier letter" unicode category https://www.compart.com/en/unicode/category/Lm and "other number" category https://www.compart.com/en/unicode/category/No, and not considering these correctly when determining which characters are "word-like" characters.
The unicode consortium has published a technical report Unicode Identifier and Pattern Syntax discussing identifiers which clearly states that "modifier letters" should be considered part of identifiers. So perhaps vim itself could just be changed to do the desired thing here by default.
Well, I dug a bit into what unicode really has to say about this (rather a lot, in fact!) and submitted https://github.com/vim/vim/issues/5038.