pugixml icon indicating copy to clipboard operation
pugixml copied to clipboard

PugiXML incorrectly handles char8_t in C++20

Open ytimenkov opened this issue 5 years ago • 3 comments
trafficstars

In C++20 there is a std::u8string and char8_t to make UTF-8 type-safe.

Unfortunately when calling xml_attribute::set_value(u8"a string") for example compiler chooses set_value(bool) overload which lead to a surprise when all attrubutes became juse "true".

Which makes me thing that providing an overload which accepts bool is too wide: it can be any pointer at least. Would be nice to constrain it somehow.

I'm not sure if there is a need to have a separate handling when pugi::char_t is wchar_t or simply provide such overload when it's a char.

ytimenkov avatar Oct 29 '20 10:10 ytimenkov

I also thought that it may be a good idea to just use char8_t for pugixml::char_t since UTF-8 is used internally anyways.

I think if consumer could define PUGIXML_TEXT and PUGIXML_CHAR directly instead of relying on PUGIXML_WCHAR_MODE things will just work (or provide more knobs...)

ytimenkov avatar Oct 29 '20 13:10 ytimenkov

Which makes me thing that providing an overload which accepts bool is too wide: it can be any pointer at least. Would be nice to constrain it somehow.

This is sensible; this can probably be achieved using a private overload with const void* argument, as you can't use enable_if or other constructs like this due to compatibility requirements. Separately, char8_t as char_t won't work because of reliance on some CRT functions like strcmp.

zeux avatar Oct 29 '20 18:10 zeux

Separately, char8_t as char_t won't work because of reliance on some CRT functions like strcmp.

Oh, I didn't look that deep, but it feels like case for std::char_traits<char_t>::compare...

I guess I should continue sticking to reinterpret_cast for now, just use it carefully :)

ytimenkov avatar Oct 30 '20 14:10 ytimenkov