unicode.vim
unicode.vim copied to clipboard
Would Perl-style short names with :UnicodeSearch be a good idea?
I have separated out the code for matching Perl-style "short" script: Char names from unicode#DigraphNew() into a separate function like this:
fu! unicode#FindScriptChars(arg) abort "{{{2
" Support for Perl-style "short" `script: Letter` and `script: letter` charnames
if len(a:arg) == 0
echoerr "Argument is empty or unspecified"
return []
endif
let regexes = []
" Test for a perl-style short charname i.e `^\s*(<script>)\s*:\s*(<charname>)\s*$`.
" This is safe since actual Unicode names do not contain any colons.
let short = matchlist(a:arg, '^\s*\(\S.\{-}\S\)\s*:\s*\(\S.*\S\@<=\)\s*$')
if empty(short)
" No match: search for a regex normally
let regexes = [a:arg]
else
let [thescript, thename] = short[1:2]
" If the name contains any uppercase search for "^<script> capital letter <name>"
" else search for "^<script> small letter <name>",
" then for "^<script> letter <name>", and return the
" results for the first one which has any hits.
" Note that the "short" lookups anchor to the start of the names
" while "regular" regex lookup does not!
let thecase = thename =~ '\u' ? 'capital' : 'small'
"
call add(regexes, printf('^%s %s letter %s', thescript, thecase, thename))
call add(regexes, printf('^%s letter %s', thescript, thename))
endif
" Compress whitespace in the regexes
call map(regexes, "substitute(v:val, '\\s\\+', ' ', 'g')")
" Loop through the regexes
for regex in regexes
let unichars = unicode#FindUnicodeBy(regex)
if len(unichars)
return unichars
endif
endfor
" Found nothing. Return empty list!
return []
endfu
Since it is now just a lookup function it returns a list of all char entries which match the regex rather than barfing if it gets other than a single match.
As yet I have only used it standalone, in the minimally changed unicode#DigraphNew() function and in my custom :UnicodeScriptChars command (and the function which powers it, using my private interpolate-by-name autoload function for formatting).
Now I wonder if you think a PR which would integrate this into :UnicodeSearch would be a good idea?
I don't think I have ever stumbled over the perl style short names (or what this is called). But if you think this is useful, I am willing to integrate a PR. I just don't know if I am able to test this.