aseba
aseba copied to clipboard
Rework utf8 support
I'm making this issue to keep track of unicode related issue, but it's a low priority work
- Remove all wstring except at windows API boundaries.
- Enforce utf8 internally
- Proper support for unicode variables names in the compiler and everywhere else we use variables names
That implies
- normalizing the identifiers to a composed form
- use
XID_Start
/XID_Continue
properties to determine whether a given identifier is valid - Probably best to have a strong "identifier" type, rather than passing strings around
The issue is that most libs to do that are based on icu
- not too unreasonable since Qt depends already on it.
However, we may want to be something more suitable if the compiler is to be compiled to wasm or embedded directly on a device
#746 #609
I can't recommend this article about unicode identifiers enough: http://perl11.org/blog/unicode-identifiers.html One of the points explained in the article that you might want to address is the issue of mixed scripts where for example you might want to disallow both greek and cyrillic in the same program.
@marvelous this article makes very good points. It's something to consider once we get the basis working - Which make take some time given how wstring
is currently (miss) used everywhere !