tinyxml2
tinyxml2 copied to clipboard
Windows and Utf-8
For the Windows compiler I noticed that fopen_s
is used without an encoding specifier in LoadFile
. According to the docs that means the file will be encoded ANSI rather than utf-8. I'm wondering if it should have the ccs=UTF-8
encoding specifier added explicitly.
In particular I am concerned about whether to supply the file path encoded ANSI or utf-8. This becomes a huge pain when dealing with file paths that contain non-ASCII characters, e.g. 'ƒ'
, and I'd like to get it right. It would be nice if tinyxml2 guaranteed that a utf-8-encoded file path would work on all platforms. (Or, alternatively, documented the correct encoding for file paths on each platform.)
I use:
// Now try to create a FILE buffer (allows UNICODE filenames)
const auto eResult = _tfopen_s(&fStream, strFileXML, _T("rb"));
const XMLError eXML = rDocXML.LoadFile(fStream);
This approach allows unicode paths.
If I understand it correctly, _tfopen_s
allows WCHAR
(i.e., utf-16) filenames without the need for explicitly specifying WCHAR
. However, I don't believe that addresses how/whether to deal with utf-8 encoded file names.
In either case, I believe there are really two issues for tinyxml2
.
- Should
tinyxml2
be explicitly specifying utf-8 encoding on whichever flavor offopen_s
it uses? I ask because the doc pages say it always uses utf-8 encoding. - Given the file contents are documented to be utf-8 always, should the file paths have the same requirement? Or in any case, should there be something in the docs about the encoding of file paths?
I appreciate you mentioning _tfopen_s
. I may look into it for my project that's calling tinyxml2
.
Windows flavour of callfopen
could trivially be modified to allow for UTF-8 paths, but it has a downside that might be a deal breaker for this feature - the only truly reliable, side-effect free way of converting a UTF-8 text to UTF-16 is MultiByteToWideChar
but that requires including windows.h
.
Not to mention the potential backwards compatibility issues... it's no telling how many projects using tinyxml2 currently assume LoadFile
/SaveFile
operate on the local codepage, and all those cases would quietly break when upgraded to a newer tinyxml2.
Honestly, I hit the very same issue not 3 days ago so I am on the same boat, but given those backwards compatibility concerns I think the most viable approach is to stick to LoadFile
/SaveFile
overloads that take a FILE*
instead of the path. If you wrap it into a tiny class to use with tinyxml2, you could keep Load/Save as one liners just like now.