aseba
aseba copied to clipboard
Rework utf8 support
Issue by cor3ntin
Friday Mar 09, 2018 at 09:55 GMT
Originally opened as https://github.com/aseba-community/aseba/issues/856
I'm making this issue to keep track of unicode related issue, but it's a low priority work
- Remove all wstring except at windows API boundaries.
- Enforce utf8 internally
- Proper support for unicode variables names in the compiler and everywhere else we use variables names
That implies
- normalizing the identifiers to a composed form
- use
XID_Start
/XID_Continue
properties to determine whether a given identifier is valid - Probably best to have a strong "identifier" type, rather than passing strings around
The issue is that most libs to do that are based on icu
- not too unreasonable since Qt depends already on it.
However, we may want to be something more suitable if the compiler is to be compiled to wasm or embedded directly on a device
Comment by marvelous
Friday Mar 09, 2018 at 10:15 GMT
I can't recommend this article about unicode identifiers enough: http://perl11.org/blog/unicode-identifiers.html One of the points explained in the article that you might want to address is the issue of mixed scripts where for example you might want to disallow both greek and cyrillic in the same program.
Comment by cor3ntin
Friday Mar 09, 2018 at 10:19 GMT
@marvelous this article makes very good points. It's something to consider once we get the basis working - Which make take some time given how wstring
is currently (miss) used everywhere !
is this still up to date?
we should totally leave that open, it's still an unmitigated mess overall. I'm not sure when or if we will get to it as it requires a significant amount of work. It works okay-ish with European languages so it's not critical either.