semantic-conventions Messaging: should polling time be included in duration of "receive" spans

Messaging: should polling time be included in duration of "receive" spans

Open pyohannes opened this issue 1 year ago • 1 comments

See https://github.com/open-telemetry/opentelemetry-demo/pull/1538

The messaging semantic conventions describe "Receive" operations as follows:

"Receive" spans SHOULD be created for operations of passing messages to the application when those operations are initiated by the application code (pull-based scenarios).

However, those operations initiated by the application code can include time for polling messages. It needs to be confirmed whether it is desirable to include the polling time in "Receive" durations.

For example, Consume on an auto-instrumented .NET Kafka client is called. This call blocks until a message is available. If it takes 10 minutes until a message is available, this results in a "Receive" operation with a duration of 10 minutes. As the .NET Kafka auto-instrumentation retroactively sets the start time of a "Receive" span to the time when the Consume call was made, this can result in a "Receive" operation that starts before the corresponding "Publish" operation, which can be misleading for users.

May 20 '24 11:05 pyohannes

There should be then polling idle time metrics, otherwise the change is going to reduce information amount... and I think this is still important metric for fine tuning polling time.

May 22 '24 11:05 RassK

This was discussed in the messaging workgroup.

This is how "Receive" spans are described in the semantic conventions:

"Receive" spans SHOULD be created for operations of passing messages to the application when those operations are initiated by the application code (pull-based scenarios).

This makes it clear that the span refers to the operation as initiated by the application code. In the current example it means that the polling time should be included in the duration of the receive span.

To avoid misleading traces, the "Receive" span shouldn't be a child of the corresponding producer span, but it should be a link:

For each message it accounts for, the "Process" or "Receive" span SHOULD link to the message's creation context.

Messaging instrumentations can create additional parent/child relationships, however, that's either mandated nor recommended by semantic conventions. As it yields sub-optimal results in this case, there should only be link.

Possibly parent/child relationships could be added as an opt-in capability, similar to how it's done here.

May 27 '24 10:05 pyohannes

semantic-conventions semantic-conventions copied to clipboard

Messaging: should polling time be included in duration of "receive" spans

semantic-conventions
semantic-conventions copied to clipboard