tinyxml2 icon indicating copy to clipboard operation
tinyxml2 copied to clipboard

obtain unicode character

Open NanoJeoMichael opened this issue 9 years ago • 7 comments

I'm working with a project need to parse unicode-characters, I got a xml-file almost like this:

<map>
    <unicode v="&#955;"/>
    <unicode v="a"/>
</map>

The value of attribute "v" in node "unicode" is an unicode character, (λ and a actually in above), the question is there's no function to access it (even if with ASCII-encoding, like 'a'), I try to code like:

const XMLElement* root = _doc.RootElement();
const XMLElement* e = root->FirstChildElement("unicode");
int v = 0; // or wchar_t
// add a function like QueryCharAttribute that can read unicode character ?
int err = e->QueryIntAttribute("v", &v);  // I got a error 'XML_WRONG_ATTRIBUTE'

and I tried this:

const char* s = e->Attribute("v");
wchar_t c = (wchar_t)s[0]; // I got a wrong value for '&#955'

I know there's a stupid way to solve this problem, just replace the unicode-character with it's code point, the xml would be like:

<map>
    <unicode v="955"/>
    <unicode v="96"/>
</map>

but is there any more efficient way to achieve my goal? Thanks!

NanoJeoMichael avatar Apr 20 '15 06:04 NanoJeoMichael

:+1:

liaofeng avatar Apr 20 '15 09:04 liaofeng

Get attribute as string and convert from utf8

kleuter avatar Apr 20 '15 11:04 kleuter

@kleuter It's a tough thing to handle with character encoding in c++ (actually I'm struggling with it...), it would be the best way to solve my problem so far... thank you a lot for your patience :+1:

NanoJeoMichael avatar Apr 20 '15 16:04 NanoJeoMichael

I don't see how this can be fixed other than storing the decoded integer value of the pointer inside the attribute "just in case". Current code sees entity encoding and crafts UTF-8 and proper dealing with that UTF-8 is totally non-trivial.

Dmitry-Me avatar May 20 '15 12:05 Dmitry-Me

TinyXML-2 is unicode (UTF-8) pure. It doesn't support UTF-16 or UCS-2, and won't. Character encoding is just too big, and should be done outside of TinyXML-2.

leethomason avatar May 26 '15 18:05 leethomason

@leethomason Actually there is code inside that can handle this. And the problem is not with coding strings back and forth, it's about getting a single character (perhaps as int).

Dmitry-Me avatar May 27 '15 07:05 Dmitry-Me

Fair point; I get caught up on the UTF-16 thing. Returning as int (UTF-32) is actually a pretty reasonable API. I'm not sure what the overlap is between UTF-32 and UTF-16 - need to do some research there - but if they mostly overlap it's probably pretty useful.

leethomason avatar May 27 '15 18:05 leethomason