nickel icon indicating copy to clipboard operation
nickel copied to clipboard

Non-ASCII identifier support

Open toastal opened this issue 10 months ago • 7 comments

Is your feature request related to a problem? Please describe.

English is a weird language. It was the basis of ASCII, but many languages—even ones also using the Latin script don’t fit inside its limited character set. As a result, there is a bias towards Latin characters [A-Za-z] without accents. Since there doesn’t appear to be a bicameral distinction requirement, all writing scripts can & probably should be a considered valid for a modern language that doesn’t have the legacy bias of older languages. As such I get unexpected token errors for situations that feel like they should be valid. Consider:

let Pokémon = {
	ID | std.number.PosNat,
	name | String,
	# …
} in

let SomeNorseGods = [| 'Odin, 'Freyr, 'Freyja, 'Þórr, 'Loki, 'Höðr, 'Sága |] in

let SomeGreekGods = [| 'Ἀφροδίτη, 'Ἀπόλλων, 'Ἄρης, 'Περσεφόνη |] in

let Buds = {
	คิว = { },
	แชมป์ = { },
	เมฆ = { },
} in

{ }

This gets unexpected token errors despite being valid (according to humans) writing scripts.

Describe the solution you'd like

If it’s a ‘letter’ in a writing system block, it’s valid. I understand errors for names with spaces or ‘symbol’ but all writing systems should be valid.

Describe alternatives you've considered

  • ‘Romanize’ everything (tho this can lead to errors as many languages distinguish between ‘e’ & ‘é’) & deburr.
  • Convert everything to English since English tends to remove all accents since English’s writing system is already a mess & since words aren’t phonemic, its speakers are used to memorizing weird or misspelled borrowings from other languages (tho exceptions where words like naïve & façade & jalapeño are often spelled with their accents which would still fail).

Additional context

toastal avatar Aug 17 '23 09:08 toastal