opentelemetry-cpp
opentelemetry-cpp copied to clipboard
Unicode logs/attributes support
Hi! In most cases simple string_view or const char * is sufficient for needs of our project. But I wonder if there are some plans to extend logs/attributes with unicode support (Neither Logger functions nor AttributeValue seems to support wchar_t)?
Is your feature request related to a problem? We are using logs/metrics (+traces in the future) quite extensively and recently a question was risen whether we can have at least unicode attributes (for example somebody wants to extend his log message with an attribute that represents some company name, which cannot be represented by ascii characters (chinese for example)).
Describe the solution you'd like Would be nice to have a possibility to log messages and/or add attributes that can represent wider range of characters
Describe alternatives you've considered One of alternatives I was thinking about is to have a possibility to send an array of bytes and corresponding encoding so that consumer knows how to interpret those bytes.
Additional context Add any other context about the feature request here.
This issue was marked as stale due to lack of activity.
Hi! Is anything planned for this one?
This request is valid and makes sense. Planning is constrained by resources. This sounds like a good task to get into the opentelemetry-cpp code, adding help needed label.
This issue is available for anyone to work on. Make sure to reference this issue in your pull request. :sparkles: Thank you for your contribution! :sparkles:
I believe const char * and nostd::string_view are sufficient for handling UTF-8 encoded strings, which can represent any Unicode character of sequences of 1 to 4 bytes. The use of wchar_t can be problematic due to its differing size and encoding between platforms. Or am I missing something?
A const char* string can represent unicode.
What needs to be clarified is whether the information "this is an ascii string" or "this is encoded in UTF8" or "this is encoded in XYZ character set" needs to be represented somewhere.
This will need a lot of testing too.
supporting (or allowing to send/detect) different unicode formats eg - UTF-8, UTF-16 etc? I don't think this should be done. The string encoding should be supported as UTF-8 . Even protobuf format takes string as valid UTF-8 format.
string := valid UTF-8 string (e.g. ASCII);
max 2GB of bytes
This seems related to https://github.com/open-telemetry/opentelemetry-specification/issues/3421.
Personally I think that supporting arbitrary encodings (+ conversions to UTF-8 or raw bytes at OTLP boundaries) would be a large increase in the complexity of the API and SDK without a large enough payoff.
In the issue description, @tobervenec seems to have assumed that nostd::string_view or const char * is meant to be used only for ASCII text. Instead of adding support for arbitrary encodings, could we start by documenting that we expect those to contain UTF-8 encoded strings?
We may also want to implement validation/sanitization for invalid UTF-8 strings (currently, IIUC, we pass them through blindly). That should be a separate issue, and should probably be blocked on the resolution of the spec issue.