opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

logs: Allow duplicate keys

Open pellared opened this issue 11 months ago • 33 comments

What

https://github.com/open-telemetry/opentelemetry-specification/blob/e8d5e1416a6a491f0fc26df60a5a2bd41535a8ca/specification/logs/data-model.md#type-mapstring-any should allow having duplicate keys.

Why

Performance

Checking for duplicates introduces performance overhead which can be undesirable for systems where logging should be high-performant. Changing the data model definition would allow postponing the deduplication to the exporters which require it (e.g. OTLP). Other exporters (e.g. console exporter) can use the key-values as provided.

Better bridging of existing logging libraries

Some logging libraries allow duplicate keys. The change is needed for faithfully bridging existing logging systems.

For instance slog (the structured logging package from Go the standard library) allows it. See: https://go.dev/play/p/fzIswL0Le7j.

pellared avatar Mar 08 '24 16:03 pellared

This would could to be changed to something like:

Type []keyval<string, any>

Value of type []keyval<string, any> is a collection of key-value pairs with string keys to any values.

Arbitrary deep nesting of values for arrays and maps is allowed (essentially allows to represent an equivalent of a JSON object).

Such change can be considered breaking as it changes the behavior.

However, I believe it can be acceptable given it is needed to correctly bridge existing logging systems. Maybe it could be left to the SDK depending on what is more idiomatic in given language.

pellared avatar Mar 11 '24 10:03 pellared

PTAL @open-telemetry/specs-logs-approvers

pellared avatar Mar 11 '24 11:03 pellared

OTel Rust Logging chose to not-do-dedup for perf reasons! Thanks for raising this.

(Similar for metrics, traces too, but will save it for another day)

cijothomas avatar Mar 12 '24 15:03 cijothomas

Discussed in the 3/12/24 spec SIG that evolving the log data model attributes representation from being a map<string, any> (i.e. no duplicate keys) to []keyval<string, any> (i.e. duplicate keys allowed) seems to be an allowable change because its loosening the restriction.

We did find language in the proto that states that keys in the KeyValueList type MUST be unique. Not sure how we could change that "MUST" without it being considered breaking to consumers.

jack-berg avatar Mar 12 '24 15:03 jack-berg

Note: This is also about log record body representation (not only log attributes).

pellared avatar Mar 12 '24 15:03 pellared

otel-cpp decided against keys de-duplication to prioritize performance, under the assumption that backend/platform-specific de-duplication needs would be addressed within their exporters.

lalitb avatar Mar 12 '24 15:03 lalitb

We did find language in the proto that states that keys in the KeyValueList type MUST be unique. Not sure how we could change that "MUST" without it being considered breaking to consumers.

I opened https://github.com/open-telemetry/opentelemetry-proto/pull/533

pellared avatar Mar 12 '24 15:03 pellared

As mentioned here I can't think of a good backwards-compatible solution to address this change on pdata: https://github.com/open-telemetry/opentelemetry-proto/pull/533#discussion_r1521813605 we implicitly assume keys are unique in our API

mx-psi avatar Mar 12 '24 16:03 mx-psi

@mx-psi how does the collector handle the logs from C++ given what @lalitb said?

MrAlias avatar Mar 12 '24 17:03 MrAlias

This would be a huge breaking change for OTLP proto handling. Right now, we explicitly disallow duplicates and allow implementations to treat the list as a map<string, AnyValue>.

The explicit reason for not choosing map... I beleive had to do with its newness and unavailability in tooling (GoGoProto). I think if we were to re-evaluate that decision today, I'd personally have pushed hard for map.

If Logging needs multiple values per-key, the OTLP they should use is a single key with value of List<.. value type..>. If they wish to define the sdk such that they can build this up with multiple values specified individually, fine.

jsuereth avatar Mar 12 '24 18:03 jsuereth

@jsuereth The point is that doing anything like

If Logging needs multiple values per-key, the OTLP they should use is a single key with value of List<.. value type..>. If they wish to define the sdk such that they can build this up with multiple values specified individually, fine.

would also increase the performance overhead.

See:

  • https://github.com/open-telemetry/opentelemetry-specification/issues/3931#issuecomment-1991911631
  • https://github.com/open-telemetry/opentelemetry-specification/issues/3931#issuecomment-1991939058

pellared avatar Mar 12 '24 18:03 pellared

I understand, but someone has to pay that cost at some point. Right now this would be a large breaking change for OTLP, in my opinion unacceptably large.

jsuereth avatar Mar 12 '24 19:03 jsuereth

I understand, but someone has to pay that cost at some point. Right now this would be a large breaking change for OTLP, in my opinion unacceptably large.

If I understand correctly, nothing would have to be changed in the collector. See https://github.com/open-telemetry/opentelemetry-proto/pull/533#issuecomment-1992343984

pellared avatar Mar 12 '24 19:03 pellared

@mx-psi how does the collector handle the logs from C++ given what @lalitb said?

@MrAlias The behavior is unspecified in this situation, since it is not allowed by the current wording of the spec. I can summarize what the current behavior is, but there are no guarantees that it will keep working the same way in the future since it is an invalid situation

mx-psi avatar Mar 12 '24 19:03 mx-psi

Both OTLP and JSON allows passing such data.

While JSON technically allows it, it also says: When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. I think this is a decent summary of the notion of "duplicate keys".

I understand why logging libraries would optimize for performance, but how much value does it actually contribute towards describing a log in a semantically meaningful way? I think at best we should consider this "deferred deduplication", and clearly apply the similar disclaimer as the JSON spec.

nothing would have to be changed in the collector

It depends whether we need to support duplicate keys throughout the collector, or just be able to handle deduplication during ingest. If the former, we would basically have to rework everything from the syntax used to reference values within structured data to the optimization strategies for parsing and transposing data, because it's all built on the assumption that keys are unique.

djaglowski avatar Mar 12 '24 19:03 djaglowski

just be able to handle deduplication during ingest.

This is good to me.

PS. I did my best to update https://github.com/open-telemetry/opentelemetry-proto/pull/533 to reflect it. Also see: https://github.com/open-telemetry/opentelemetry-proto/pull/533#issuecomment-1992343984

EDIT:

clearly apply the similar disclaimer as the JSON spec.

This is already in the PR.

pellared avatar Mar 12 '24 19:03 pellared

.NET represents logs attributes as key-value pairs.

Related issue: https://github.com/open-telemetry/opentelemetry-dotnet/issues/4324

pellared avatar Mar 13 '24 08:03 pellared

Discussed in the 3/12/24 spec SIG that evolving the log data model attributes representation from being a map<string, any> (i.e. no duplicate keys) to []keyval<string, any> (i.e. duplicate keys allowed) seems to be an allowable change because its loosening the restriction.

I have listened to the recording of the SIG meeting. I cannot understand how this change can be viewed as "allowed". It's obviously a breaking change to a stable spec that would affect any implementation of the spec and of the OTLP protocol.

The logs spec is clear: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.30.0/specification/logs/data-model.md#type-mapstring-any

The keys in the map are unique (duplicate keys are not allowed).

This change would affect not only the Otel collector, but any implementation of the OTLP protocol. For instance, Sumo Logic accepts OTLP and assumes that the keys are unique.

There's a reason the spec is stable. There was a time to decide this and this time has passed.

I surely understand the need for logging bridges to be performant and I'm all for preserving the data that slog or other libraries emit with duplicate keys. We should look at other ways to handle this. Jack's proposal during the SIG meeting of emitting one array field instead of multiple fields sounds reasonable to me.

andrzej-stencel avatar Mar 13 '24 11:03 andrzej-stencel

Jack's proposal during the SIG meeting of emitting one array field instead of multiple fields sounds reasonable to me.

This would cause performance overhead as well.

Sumo Logic accepts OTLP and assumes that the keys are unique

What does it mean? The proto technically allows passing duplicates. Are you ignoring data that contain duplicated keys? Passing such data was always possible.

I propose to specify it as undefined behavior. The OTLP exporters can have a configuration to deduplicate key-val pairs before sending the data. (I added this proposal to the issue description).

pellared avatar Mar 13 '24 11:03 pellared

What does it mean? The proto technically allows passing duplicates. Are you ignoring data that contain duplicated keys?

The behavior is unspecified, as this is currently not a valid payload :slightly_smiling_face:

The fact that Sumo Logic or any other implementation behaves this way or another doesn't make it less of a breaking change. Given that the OTLP definition also explicitly states:

The keys MUST be unique (it is not allowed to have more than one value with the same key).

This means that it's acceptable for an implementation to assume the keys are unique and reject a payload with duplicate keys as invalid.

andrzej-stencel avatar Mar 13 '24 11:03 andrzej-stencel

Even if the change would be seen as not acceptable for Collector:

  • other exporters (e.g. console exporter) can take benefit from it
  • OTLP exporters could have a configuration to opt-out from deduplicating key-values with the same key (trade-off: performance over predictable handling of duplicated key-values)

pellared avatar Mar 13 '24 12:03 pellared

This means that it's acceptable for an implementation to assume the keys are unique and reject a payload with duplicate keys as invalid.

I agree. I updated https://github.com/open-telemetry/opentelemetry-proto/pull/533 to reflect this.

pellared avatar Mar 13 '24 12:03 pellared

I changed https://github.com/open-telemetry/opentelemetry-specification/issues/3931 to draft as changes in OTLP are undesired.

Still, updating the definition of on model level can be helpful as it would postpone the deduplication logic to the exporters which require it (e.g. OTLP) where other can use the key-values as provided.

EDIT: I updated the issue description.

pellared avatar Mar 13 '24 12:03 pellared

I opened https://github.com/open-telemetry/opentelemetry-specification/pull/3938

pellared avatar Mar 13 '24 13:03 pellared

@open-telemetry/java-approvers @open-telemetry/php-approvers do either of your implementations de-duplicate keys?

MrAlias avatar Mar 13 '24 15:03 MrAlias

I can't pretend to know the pain any language sigs are feeling from this restriction, but I want to caution us from making a decision based on our languages implementations alone. Since we expected other communities/vendors to implement OTLP and adhere to the Logs Data Model, this is a breaking change in my eyes regardless of how many OpenTelemetry Language SIGs have already chosen to not adhere to this specific restriction.

TylerHelmuth avatar Mar 13 '24 15:03 TylerHelmuth

I can't pretend to know the pain any language sigs are feeling from this restriction, but I want to caution us from making a decision based on our languages implementations alone. Since we expected other communities/vendors to implement OTLP and adhere to the Logs Data Model, this is a breaking change in my eyes regardless of how many OpenTelemetry Language SIGs have already chosen to not adhere to this specific restriction.

I mean, if every source of telemetry is already sending duplicate keys to vendors, then they are already handling duplicate keys. Whether intentional or not. By updating the proto to reflect reality it would better communicate to these vendors the actual state of the world.

MrAlias avatar Mar 13 '24 15:03 MrAlias

then they are already handling duplicate keys

Depends on what you mean by handling. True support of duplicate keys feature would mean allowing both instances of the key to be retrievable/settable.

I can only speak for the Collector and Honeycomb, but for the collector only 1 instance of the key will be able to be interacted with and for Honeycomb only 1 instance of the key/value will be kept. So both instances handle it in the sense that they don't crash, but neither allow a user to actually take advantage of duplicate keys as a feature.

Since the spec/proto currently define uniqueness as a requirement I don't think the Collector and Honeycomb are alone in they way they handle any existing incompatibility with the specification.

Despite some existing implementations allowing duplicate keys, I don't like introducing a breaking change without a major version bump. I agree with @astencel-sumo that we attempt to solve this problem another way (although I recognize we're between a rock and a hard place).

TylerHelmuth avatar Mar 13 '24 15:03 TylerHelmuth

Given the amount of concerns raised (very valid ones!), it maybe better to find a balance - the OTel API (bridge)/SDK can avoid de-duplication logic, and that'd meet the perf goal. And the Exporter(s) can do the dedup (since they run in separate thread typically, this should be acceptable from perf standpoint) to comply with expectation of no-duplicate. The dedup logic could be to ignore duplicates (no guarantee which value ultimately wins, as it depends on implementation), or to make it list of values as suggested in earlier comments.

(OTel .NET was also planning to do this in the exporter thread anyway - https://github.com/open-telemetry/opentelemetry-dotnet/issues/4324 . But I am not sure if it makes (or is already) OTel .NET uncompliant with the spec.)

cijothomas avatar Mar 13 '24 16:03 cijothomas

Given the amount of concerns raised (very valid ones!), it maybe better to find a balance - the OTel API (bridge)/SDK can avoid de-duplication logic, and that'd meet the perf goal. And the Exporter(s) can do the dedup (since they run in separate thread typically, this should be acceptable from perf standpoint) to comply with expectation of no-duplicate. The dedup logic could be to ignore duplicates (no guarantee which value ultimately wins, as it depends on implementation), or to make it list of values as suggested in earlier comments.

Yes - I think we should NOT update the specification to allow this. However if an API/SDK decided to move where de-duplciation logic happens to optimise performance, that's entirely reasonable.

What we should ensure (due to MUST in OTLP specification) is duplicate keys don't get sent over the wire / consumers of OTLP don't need to deal with the issue.

Depends on what you mean by handling. True support of duplicate keys feature would mean allowing both instances of the key to be retrievable/settable.

I wasn't even suggesting this extreme. I simply mean someone will have to merge or drop the duplicate key at some point.

I don't think the problem of potential duplicates is new to logs, but also exists for Spans.

Regarding Go slog: https://go.dev/play/p/fzIswL0Le7j - We could declare, as OpenTelemetry, that duplicated keys are unsupported. In fact, I'd be highly curious how any logging backend handles that scenario today. I'd also be curious what the friction would be asking users to change those duplicate-label scenarios to have different labels or write a combined value.

I guess another way of saying this: Sure, slog supports it as a possibility but do people actually do it, and if so, how widespread is it. If it's a very esoteric minority, I don't think we need to support it. If it's a huge number of Go applications, we should find a way to support it that doesn't break all OpenTelemetry protocol consumers.

jsuereth avatar Mar 13 '24 19:03 jsuereth