autocomplete-plus
autocomplete-plus copied to clipboard
Problem with Unicode support
Description
Auto-completion doesn't work properly when writing Tamil text. This is probably because the unicode-helpers.js file has only the codepoints from the Letter
Unicode category, where it should actually have all codepoints that have the Alphabetic
property. So for eg. it has 0B95
for க, but not 0BC0
which is a vowel mark that combines with க to make கீ.
To give an idea of how weird this is, Atom gives completion when I type the equivalent of "SaRaGaMa", but not when I type the equivalent Tamil text of "SaReGaMa" - having any vowel other than a
(அ) in the prefix disables the autocompletion.
A lot of codepoints for many Indic scripts (and some other Asian scripts) are placed in the M
(Mark) categories in Unicode, and then given the Other_Alphabetic
property (find for Other_Alphabetic
in https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt). This means the current completion probably doesn't work properly for any of those languages. The solution, afaict, is just to add all the Other_Alphabetic
codepoints from that page to unicode-helpers.js.
Steps to Reproduce
- Type (or paste) எழுத்துக்கள் on the first line of a file
- Type (or paste) எழுத்து on the second line
Expected behavior: Completion to எழுத்துக்கள் should appear
Actual behavior: No completion appears - and pressing Ctrl-Space does nothing either
Reproduces how often: 100%
Versions
Version 1.27.1 on Windows 7 64-bit
(This is all after enabling 'Extended Unicode Support' - without checking that option, no completion happens at all; after checking it, completion happens only for 'a' vowel like mentioned in the second para above.
Also, changing between the Sequence and Symbol modes doesn't seem to make any difference. )
Thanks for the report! I can reproduce with your steps on macOS 10.12.6 and Atom 1.29.0-dev-e31c972d3.