nickel Non-ASCII identifier support

Non-ASCII identifier support

Open toastal opened this issue 10 months ago • 7 comments

Is your feature request related to a problem? Please describe.

English is a weird language. It was the basis of ASCII, but many languages—even ones also using the Latin script don’t fit inside its limited character set. As a result, there is a bias towards Latin characters [A-Za-z] without accents. Since there doesn’t appear to be a bicameral distinction requirement, all writing scripts can & probably should be a considered valid for a modern language that doesn’t have the legacy bias of older languages. As such I get unexpected token errors for situations that feel like they should be valid. Consider:

let Pokémon = {
	ID | std.number.PosNat,
	name | String,
	# …
} in

let SomeNorseGods = [| 'Odin, 'Freyr, 'Freyja, 'Þórr, 'Loki, 'Höðr, 'Sága |] in

let SomeGreekGods = [| 'Ἀφροδίτη, 'Ἀπόλλων, 'Ἄρης, 'Περσεφόνη |] in

let Buds = {
	คิว = { },
	แชมป์ = { },
	เมฆ = { },
} in

{ }

This gets unexpected token errors despite being valid (according to humans) writing scripts.

Describe the solution you'd like

If it’s a ‘letter’ in a writing system block, it’s valid. I understand errors for names with spaces or ‘symbol’ but all writing systems should be valid.

Describe alternatives you've considered

‘Romanize’ everything (tho this can lead to errors as many languages distinguish between ‘e’ & ‘é’) & deburr.
Convert everything to English since English tends to remove all accents since English’s writing system is already a mess & since words aren’t phonemic, its speakers are used to memorizing weird or misspelled borrowings from other languages (tho exceptions where words like naïve & façade & jalapeño are often spelled with their accents which would still fail).

Additional context

Aug 17 '23 09:08 toastal

nickel nickel copied to clipboard

Non-ASCII identifier support

nickel
nickel copied to clipboard