unicode.vim icon indicating copy to clipboard operation
unicode.vim copied to clipboard

Would Perl-style short names with :UnicodeSearch be a good idea?

Open bpj opened this issue 4 years ago • 1 comments

I have separated out the code for matching Perl-style "short" script: Char names from unicode#DigraphNew() into a separate function like this:

fu! unicode#FindScriptChars(arg) abort "{{{2
    " Support for Perl-style "short" `script: Letter` and `script: letter` charnames
    if len(a:arg) == 0
        echoerr "Argument is empty or unspecified"
        return []
    endif
    let regexes = []
    " Test for a perl-style short charname i.e `^\s*(<script>)\s*:\s*(<charname>)\s*$`.
    " This is safe since actual Unicode names do not contain any colons.
    let short = matchlist(a:arg, '^\s*\(\S.\{-}\S\)\s*:\s*\(\S.*\S\@<=\)\s*$')
    if empty(short)
        " No match: search for a regex normally
        let regexes = [a:arg]
    else
        let [thescript, thename] = short[1:2]
        " If the name contains any uppercase search for "^<script> capital letter <name>"
        " else search for "^<script> small letter <name>",
        " then for "^<script> letter <name>", and return the
        " results for the first one which has any hits.
        " Note that the "short" lookups anchor to the start of the names
        " while "regular" regex lookup does not!
        let thecase = thename =~ '\u' ? 'capital' : 'small'
        " 
        call add(regexes, printf('^%s %s letter %s', thescript, thecase, thename))
        call add(regexes, printf('^%s letter %s', thescript, thename))
    endif
    " Compress whitespace in the regexes
    call map(regexes, "substitute(v:val, '\\s\\+', ' ', 'g')")
    " Loop through the regexes
    for regex in regexes
        let unichars = unicode#FindUnicodeBy(regex)
        if len(unichars)
            return unichars
        endif
    endfor
    " Found nothing. Return empty list!
    return []
endfu

Since it is now just a lookup function it returns a list of all char entries which match the regex rather than barfing if it gets other than a single match.

As yet I have only used it standalone, in the minimally changed unicode#DigraphNew() function and in my custom :UnicodeScriptChars command (and the function which powers it, using my private interpolate-by-name autoload function for formatting).

Now I wonder if you think a PR which would integrate this into :UnicodeSearch would be a good idea?

bpj avatar Jun 15 '21 11:06 bpj

I don't think I have ever stumbled over the perl style short names (or what this is called). But if you think this is useful, I am willing to integrate a PR. I just don't know if I am able to test this.

chrisbra avatar Jun 15 '21 11:06 chrisbra