Case conversion functions don't work with non-ASCII characters
Tested on Windows 10 desktop target:
Print "Привет".ToUpper()
Print "Привет".ToLower()
Print "привет".Capitalize()
Print "ÄÄ".ToLower()
Print "ää".ToUpper()
Print "ää".Capitalize()
output:
Привет
Привет
привет
ÄÄ
ää
ää
On Mac OS this is solved by setting codepage to en_US.UTF-8 before calling std::toupper() or std::tolower()
#include <clocale>
...
const char *lcc = std::setlocale(LC_CTYPE, "en_US.UTF-8");
if (lcc) {
bb_print(bbString(lcc));
} else {
bb_print(bbString(L"NULL",4));
}
Windows is a different beast: setlocale returns NULL for this codepage value. Any other valid (for Windows) settings like "" or "English" lead to case conversion working only partially - Cyrillic characters are converted to something unrecognizable.
This seems to work on windows (msvc x64 anyway...):
bbString bbString::toUpper()const{
std::setlocale( LC_ALL,"English" ); //this seems to be the correct
incantation....
Rep *rep=Rep::alloc( length() );
for( int i=0;i<length();++i ) rep->data[i]=std::towupper( data()[i] );
//note: using std::towupper.
return rep;
}
You'll also need these at the top of the file:
#include <cwctype>
#include <clocale>
The cyrillic looks OK to me but I can't be sure - here's app output:
ПРИВЕТ
привет
Привет
ää
ÄÄ
Ää
It's OK to use bbString chars directly with towupper (or std::towupper) so you don't need to realloc the string - bbString chars are effectively wchar_t's.
On Tue, Mar 20, 2018 at 4:47 PM, secondgeargames [email protected] wrote:
On Mac OS this is solved by setting codepage to en_US.UTF-8 before calling std::toupper()
const char *lcc = std::setlocale(LC_CTYPE, "en_US.UTF-8"); if (lcc) { bb_print(bbString(lcc)); } else { bb_print(bbString(L"NULL",4)); }
Windows is a different beast: set locale returns NULL for this codepage value. Any other valid (for Windows) settings like "" or "English" lead to case conversion working only partially - Cyrillic characters are converted to something unrecognizable.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blitz-research/monkey2/issues/355#issuecomment-374464046, or mute the thread https://github.com/notifications/unsubscribe-auth/ADU3QsWP5R_t0yQTJ98lSfNxfwPAUVGLks5tgHu2gaJpZM4SretL .
::towupper works here too.
Sure it isn't your theme affecting cyrillic output? I'm using 'hollow'...
On Tue, Mar 20, 2018 at 5:09 PM, Mark Sibly [email protected] wrote:
This seems to work on windows (msvc x64 anyway...):
bbString bbString::toUpper()const{ std::setlocale( LC_ALL,"English" ); //this seems to be the correct incantation.... Rep *rep=Rep::alloc( length() ); for( int i=0;i<length();++i ) rep->data[i]=std::towupper( data()[i] ); //note: using std::towupper. return rep; }You'll also need these at the top of the file:
#include <cwctype> #include <clocale>The cyrillic looks OK to me but I can't be sure - here's app output:
ПРИВЕТ привет Привет ää ÄÄ ÄäIt's OK to use bbString chars directly with towupper (or std::towupper) so you don't need to realloc the string - bbString chars are effectively wchar_t's.
On Tue, Mar 20, 2018 at 4:47 PM, secondgeargames <[email protected]
wrote:
On Mac OS this is solved by setting codepage to en_US.UTF-8 before calling std::toupper()
const char *lcc = std::setlocale(LC_CTYPE, "en_US.UTF-8"); if (lcc) { bb_print(bbString(lcc)); } else { bb_print(bbString(L"NULL",4)); }
Windows is a different beast: set locale returns NULL for this codepage value. Any other valid (for Windows) settings like "" or "English" lead to case conversion working only partially - Cyrillic characters are converted to something unrecognizable.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blitz-research/monkey2/issues/355#issuecomment-374464046, or mute the thread https://github.com/notifications/unsubscribe-auth/ADU3QsWP5R_t0yQTJ98lSfNxfwPAUVGLks5tgHu2gaJpZM4SretL .
Seems to work with mingw and msvc x86 too!
On Tue, Mar 20, 2018 at 5:11 PM, Mark Sibly [email protected] wrote:
::towupper works here too.
Sure it isn't your theme affecting cyrillic output? I'm using 'hollow'...
On Tue, Mar 20, 2018 at 5:09 PM, Mark Sibly [email protected] wrote:
This seems to work on windows (msvc x64 anyway...):
bbString bbString::toUpper()const{ std::setlocale( LC_ALL,"English" ); //this seems to be the correct incantation.... Rep *rep=Rep::alloc( length() ); for( int i=0;i<length();++i ) rep->data[i]=std::towupper( data()[i] ); //note: using std::towupper. return rep; }You'll also need these at the top of the file:
#include <cwctype> #include <clocale>The cyrillic looks OK to me but I can't be sure - here's app output:
ПРИВЕТ привет Привет ää ÄÄ ÄäIt's OK to use bbString chars directly with towupper (or std::towupper) so you don't need to realloc the string - bbString chars are effectively wchar_t's.
On Tue, Mar 20, 2018 at 4:47 PM, secondgeargames < [email protected]> wrote:
On Mac OS this is solved by setting codepage to en_US.UTF-8 before calling std::toupper()
const char *lcc = std::setlocale(LC_CTYPE, "en_US.UTF-8"); if (lcc) { bb_print(bbString(lcc)); } else { bb_print(bbString(L"NULL",4)); }Windows is a different beast: set locale returns NULL for this codepage value. Any other valid (for Windows) settings like "" or "English" lead to case conversion working only partially - Cyrillic characters are converted to something unrecognizable.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blitz-research/monkey2/issues/355#issuecomment-374464046, or mute the thread https://github.com/notifications/unsubscribe-auth/ADU3QsWP5R_t0yQTJ98lSfNxfwPAUVGLks5tgHu2gaJpZM4SretL .
Just checked on Windows: std::towupper with "English" worked (msvc x86)! Thank you Mark! Has nothing to do with theme - I'm using Prime-Blue.
2 targets down, feel free to add more!
On Tue, Mar 20, 2018 at 5:21 PM, secondgeargames [email protected] wrote:
Just checked on Windows: std::towupper with "English" worked (msvc x86)! Thank you Mark!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blitz-research/monkey2/issues/355#issuecomment-374469792, or mute the thread https://github.com/notifications/unsubscribe-auth/ADU3QimL-y2jDj0MVXwV-mmbPNeV0c3Dks5tgIPjgaJpZM4SretL .
On Android (emulator) this doesn't work:
#include <cwctype>
#include <clocale>
...
std::setlocale( LC_CTYPE,"en_US.UTF-8" );
Rep *rep=Rep::alloc( length() );
for( int i=0;i<length();++i ) rep->data[i]=std::towupper( data()[i] );
return rep;
...
although setlocale returns "C.UTF-8". Changing LC_CTYPE to LC_ALL has no effect either.
The above code works on both macOS and iOS (simulator). It didn't work on Android (emulator and device) and I don't have a Linux machine set up to test the Linux target.
I was curious how it is solved in Godot. It seems they just built a lookup table of upper and lower case characters (some 650 or so of them) and use it for case conversion. Guaranteed to be cross-platform :)
Are there any functions that would work with your utf8 encoded CStrings?
According to this, we'll need to use Java code for ToUpper/ToLower/Capitalize - yuck:
https://issuetracker.google.com/issues/36974962
On Wed, Mar 21, 2018 at 4:39 AM, secondgeargames [email protected] wrote:
The above code works on both macOS and iOS (simulator). It didn't work on Android (emulator on Windows) and I don't have a Linux machine set up tp test Linux target.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blitz-research/monkey2/issues/355#issuecomment-374644858, or mute the thread https://github.com/notifications/unsubscribe-auth/ADU3QkY_OZNQg7fVVy8896--4prfRLYSks5tgSLBgaJpZM4SretL .
Ok, attempt one at android version complete - please give it a whirl and also check other versions, I'll do linux now.
On Wed, Mar 21, 2018 at 10:05 AM, Mark Sibly [email protected] wrote:
According to this, we'll need to use Java code for ToUpper/ToLower/Capitalize - yuck:
https://issuetracker.google.com/issues/36974962
On Wed, Mar 21, 2018 at 4:39 AM, secondgeargames <[email protected]
wrote:
The above code works on both macOS and iOS (simulator). It didn't work on Android (emulator on Windows) and I don't have a Linux machine set up tp test Linux target.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blitz-research/monkey2/issues/355#issuecomment-374644858, or mute the thread https://github.com/notifications/unsubscribe-auth/ADU3QkY_OZNQg7fVVy8896--4prfRLYSks5tgSLBgaJpZM4SretL .
It appears to work fine in:
- Android (compiled on Windows, tested on both emulator and device)
- Windows 10 (msvc and gcc x86)
- MacOS and iOS (simulator)
I agree with "yuck". It would be much easier if c++ had a consistent set of cross-platform standard libraries for such essential things. Sorry it turned out more trouble than I expected.
Sorry it turned out more trouble than I expected.
No problemo, had to be done and we've got a Monkey2Lang.java file in there now which'll make future problems like this easier to deal with.
I can kind of see the problem - they've really got to provide 2 implementations of this stuff, it's probably not worth it. What I'm mainly surprised by is that ToLower and ToUpper can produce different results in different locale? I would have guessed that, working with unicode at least, they would just do the same thing everywhere...
On Wed, Mar 21, 2018 at 3:28 PM, secondgeargames [email protected] wrote:
It appears to work fine in:
- Android (compiled on Windows, tested on both emulator and device)
- Windows 10 (msvc and gcc x86)
- MacOS
I agree with "yuck". It would be a better world if c++ had a consistent set of cross-platform standard libraries for such essential things. Sorry it turned out more trouble than I expected.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blitz-research/monkey2/issues/355#issuecomment-374818092, or mute the thread https://github.com/notifications/unsubscribe-auth/ADU3QtVnFiiZuB-gslcckOrm1koAlz2Cks5tgbrMgaJpZM4SretL .
Found one edge case. I don't ask you to fix it, I'm just writing about it here for future reference. German eszett is capitalized differently on different platforms: Windows: Straße becomes STRAßE Android: Straße becomes STRASSE I found the same behavior in Monkey-X, and I just worked around it.
Just out of interest, which one of those is correct? Based on my very limited knowledge of german, they are both acceptable, aren't they?
AFAIK they are moving towards capital ß instead of SS, but both are still accepted. I personally prefer ß: the length of the word is preserved, and there is no problems with things like "Straße".ToUpper().ToLower() = "strasse".