aseba icon indicating copy to clipboard operation
aseba copied to clipboard

Rework utf8 support

Open MobsyaBot opened this issue 6 years ago • 5 comments

Issue by cor3ntin Friday Mar 09, 2018 at 09:55 GMT Originally opened as https://github.com/aseba-community/aseba/issues/856


I'm making this issue to keep track of unicode related issue, but it's a low priority work

  • Remove all wstring except at windows API boundaries.
  • Enforce utf8 internally
  • Proper support for unicode variables names in the compiler and everywhere else we use variables names

That implies

  • normalizing the identifiers to a composed form
  • use XID_Start / XID_Continue properties to determine whether a given identifier is valid
  • Probably best to have a strong "identifier" type, rather than passing strings around

The issue is that most libs to do that are based on icu - not too unreasonable since Qt depends already on it. However, we may want to be something more suitable if the compiler is to be compiled to wasm or embedded directly on a device

MobsyaBot avatar Apr 17 '18 17:04 MobsyaBot

Comment by cor3ntin Friday Mar 09, 2018 at 09:56 GMT


#746 #609

MobsyaBot avatar Apr 17 '18 17:04 MobsyaBot

Comment by marvelous Friday Mar 09, 2018 at 10:15 GMT


I can't recommend this article about unicode identifiers enough: http://perl11.org/blog/unicode-identifiers.html One of the points explained in the article that you might want to address is the issue of mixed scripts where for example you might want to disallow both greek and cyrillic in the same program.

MobsyaBot avatar Apr 17 '18 17:04 MobsyaBot

Comment by cor3ntin Friday Mar 09, 2018 at 10:19 GMT


@marvelous this article makes very good points. It's something to consider once we get the basis working - Which make take some time given how wstring is currently (miss) used everywhere !

MobsyaBot avatar Apr 17 '18 17:04 MobsyaBot

is this still up to date?

mbonani avatar Feb 12 '19 08:02 mbonani

we should totally leave that open, it's still an unmitigated mess overall. I'm not sure when or if we will get to it as it requires a significant amount of work. It works okay-ish with European languages so it's not critical either.

cor3ntin avatar Feb 12 '19 09:02 cor3ntin