nickel
nickel copied to clipboard
Non-ASCII identifier support
Is your feature request related to a problem? Please describe.
English is a weird language. It was the basis of ASCII, but many languages—even ones also using the Latin script don’t fit inside its limited character set. As a result, there is a bias towards Latin characters [A-Za-z]
without accents. Since there doesn’t appear to be a bicameral distinction requirement, all writing scripts can & probably should be a considered valid for a modern language that doesn’t have the legacy bias of older languages. As such I get unexpected token
errors for situations that feel like they should be valid. Consider:
let Pokémon = {
ID | std.number.PosNat,
name | String,
# …
} in
let SomeNorseGods = [| 'Odin, 'Freyr, 'Freyja, 'Þórr, 'Loki, 'Höðr, 'Sága |] in
let SomeGreekGods = [| 'Ἀφροδίτη, 'Ἀπόλλων, 'Ἄρης, 'Περσεφόνη |] in
let Buds = {
คิว = { },
แชมป์ = { },
เมฆ = { },
} in
{ }
This gets unexpected token
errors despite being valid (according to humans) writing scripts.
Describe the solution you'd like
If it’s a ‘letter’ in a writing system block, it’s valid. I understand errors for names with spaces or ‘symbol’ but all writing systems should be valid.
Describe alternatives you've considered
- ‘Romanize’ everything (tho this can lead to errors as many languages distinguish between ‘e’ & ‘é’) & deburr.
- Convert everything to English since English tends to remove all accents since English’s writing system is already a mess & since words aren’t phonemic, its speakers are used to memorizing weird or misspelled borrowings from other languages (tho exceptions where words like naïve & façade & jalapeño are often spelled with their accents which would still fail).
Additional context