Remove AnsiString and AnsiChar and set of char usage. Add TCharacter usage for unicode classes
I'll try do this.
See also: http://docwiki.embarcadero.com/VCL/en/Character.TCharacter
In http://roman.yankovsky.me/?p=577 you wrote that you do it yourself. Is there any progress?
Yes, this was my plan, but unfortunately there is no progress. I would very appreciate your help.
Did you have a chance to start this work?
Not yet, sorry. I've recognized that is not necessary for my job now.
I've looked at this task again and found at the current original UNIX-way yacc/lex same problem with the ANSI charset usage. There are widely used workaround like declare UTF-8 sequences as special text in lex file. And some feeling of doubt and qualm has gone to me.
see for example: http://compiler.su/nado-li-ispolzovat-yacc-lex-i-podobnye-instrumenty.php http://www.mkssoftware.com/support/kb/default.asp?article=54
But Python implementation can use unicode as native strings in puthon and allow specially enable some unicode support for used RegExps: http://www.dabeaz.com/ply/ply.html#ply_nn22 ("4.20 Miscellaneous Issues" section notes unicode)
Exists same implementations with unicode support declared: http://www.visoracle.com/download/freeware30/cssxml/sas.html
And documentation of GNU lexx implementation Bison declare environment dependences of encoding: see http://www.gnu.org/software/bison/manual/bison.html#Symbols
Conversion to sring and TCharacter must be performed.