toml11
toml11 copied to clipboard
Support C++20 and std::u8string
When compiling for C++20, the following error occurs:
../tests/test_parse_unicode.cpp:51:23: error: no matching conversion for functional-style cast from 'const char8_t [53]' to 'std::string' (aka 'basic_string<char, char_traits<char>, allocator<char> >')
std::string(u8"Ýôú'ℓℓ λáƭè ₥è áƒƭèř ƭλïƨ - #"));
It looks like the introduction of std::u8string is causing problems for conversions between char8_t
and std::string
types.
I'm not sure the best way to handle this. My first though is to create a type alias which can be configured to std::string
for C++11, C++14, and C++17 or std::u8string
for C++20 and newer. That brings up an important question. Should the API for toml11 only support std::u8string
for C++20 and beyond?
Yes, I know that problem... The other day, I did the same thing as you did and encountered the same error. I'm also not sure what is the best way to deal with it. Anyway, thank you for reporting this. The priority increased.
There can be several options. One is, as you suggested, to add a type alias to switch the implementation of toml::string
from std::string
to std::u8string
. In this way, the users do not need to mind about the character type used, but combining it with no-u8string (i.e. existing) code in c++20 mode could become a bit harder.
Another is to add a template parameter to toml::value
to give the users a choice. We can choose which one to use in the user code, but the templatized code would become messy.
The most ad-hoc solution is to convert char8_t
literal to std::string
in the test codes byte by byte, but it does not solve the fundamental problem.
Basically, I want to provide users the flexibility and controllability. So I prefer the second option in the previous paragraph, template. But currently, I've not done anything about this because the priority was low. Also, since I recognized the problem only a few days ago, I'm still not so confident about the solution. There could be another, better idea, not sure...
I dug up some information on this and it looks like nobody is happy about the breaking conversions for std::u8string
and char8_t
. It looks like several built-in types are missing proper specializations for u8
types in C++20.
- {fmt} issue on
char8_t
support - https://github.com/fmtlib/fmt/issues/1405 - StackOverflow answer about C++ u8 conversions: https://stackoverflow.com/a/59055485/9835303
- Proposal to not using
std::u8string
orchar8_t
: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1747r0.html - PR to fix std::u8string usage in PyBind: https://github.com/pybind/pybind11/pull/2026
Would there be a way to also support reading to wstring instead of string, and serializing from wstring as UTF-8?
Sorry for the late response. But we don't have a plan to serialize into/deserialize from wstring
. Actually, wchar_t
is an implementation-defined character and the internal representation of wchar_t
is not guaranteed to be Unicode (it could be a local character encoding format). Even if the environment uses Unicode, the encoding format of wchar_t
might not be utf-8, but utf-16 (e.g., windows) or utf-32 (e.g., linux). Since TOML standard says TOML data should be encoded in the utf-8 format, we can focus on char
(the traditional way of handling byte arrays) and char8_t
.
You can use compiler's builtin or OS API for convertion between an array of wchar_t
and a utf-8 byte buffer. <codecvt>
could be another option, but note that codecvt_utf8
is deprecated since C++17.
No problem.
I have found much better C++ parser for TOML in the meantime, which supports all conversions from/to STL containers, and whose author is more open-minded when it comes to feature requests which would make their library useful for more people.
Nice. Most of the libraries are provided as is and toml11 is no exception. I hope you could solve your problem. The implementation of new features might take some time, and I don't always have time. But pull requests for new features are always welcome.
Coming back to the original problem, I have added a workaround and now both ""_toml
and u8""_toml
literal works in C++20 mode in the current release. Now CI contains test cases with C++20 mode using several famous compilers. It seems that all the features work in C++20.
And thank you very much jwillikers for the surveying the situation.
Currently u8string
is still not supported, but I will later implement the conversion from std::u8string
via get
and find
and conversion to toml::value
. That means that a normal std::string
will be used as an internal string representation and we would not be able to get a raw reference to u8string
, but I think it is a good compromise in the current situation. Adding many ifdef
s makes the code complicated.
Most of the libraries are provided as is and toml11 is no exception.
I understand that very well, the only reason I ever asked about std::wstring
support is because it is part of C++ STL, and it is kind of unavoidable to use std::wstring
and the underlying wchar_t
if you want to do any C++ coding on Windows.
I also understand that wchar_t
is not the same size on Linux / mac OS, and that char
there usually means UTF-8, so if you wrote your library with those operating systems in mind it is clear why you would refuse to support wchar_t
and std::wstring
.
I hope you could solve your problem.
Yes, I have solved it by switching to toml++.
The implementation of new features might take some time, and I don't always have time. But pull requests for new features are always welcome.
I understand that as well. However, people sometimes need to get their own work done too. That's usually why they look for a library someone else wrote in the first place -- to avoid having to implement stuff in a domain they aren't familiar with under time constraints of their own project or work assignment.
Sorry for the slight off-topic, and I apologize if I came through as disrespectful with my previous response.