Revise support for UTF-8-valid string truncation

Open jmacd opened this issue 3 years ago • 1 comments

Problem Statement

As discussed in https://github.com/open-telemetry/opentelemetry-proto/issues/426, we currently do not have clarity from the specification for how to implement correct truncation when string-valued attributes exceed the specified limits, and yet the OTLP protobuf encoding requires valid UTF-8.

While https://github.com/open-telemetry/opentelemetry-go/pull/3156 offered a quick fix meant to alleviate the pain for users, this deserves careful consideration. It is possible to implement an O(1) truncation, if that is desired, although even with UTF-8-correct truncation, users can still enter invalid UTF-8 that we have not specified how to handle.

Proposed Solution

In the next release cycle (after 1.10.x), consider either faster support with less validation (i.e., an O(1) truncation approach) or a more-comprehensive approach to validation (i.e., ensure valid UTF-8 for all strings, not only truncated attribute values).

Alternatives

Discussed in https://github.com/open-telemetry/opentelemetry-proto/issues/426#issuecomment-1242337687

Sep 09 '22 20:09 jmacd

I asked the OTel-Java group how this is handled. Because the Java String.substring() method counts UTF-8 characters, I believe it matches the behavior introduced in #3156.

Sep 14 '22 19:09 jmacd