semantic-conventions
semantic-conventions copied to clipboard
Messaging.operation.type attribute should not be required
Operation type identifies common messaging operations (which may be called differently in specific systems): e.g. publish can be called send, schedule, or enqueue.
In such cases we use messaging.operation.name to represent system-friendly name.
However there are other types of operations that don't match any common type. E.g, cancel scheduled message, delete, create topic, etc...
While those are not targeted for stability, we should make sure messaging.operation.type requirement level is adequate and future-proof.
Originally posted by @lmolkova in https://github.com/open-telemetry/semantic-conventions/pull/942#discussion_r1575412911
Options:
-
Don't populate
messaging.operation.typeat all. We have span kind and operation name to distinguish between different operations. We only need it to distinguish operations within the same kind (bothcreateandpublishare producers and bothreceiveandprocessare consumers), but is it really necessary in presence of the operation name? -
Make
messaging.operation.typeconditionally_required: when applicable -
Keep
messaging.operation.typerequired, but then tell that if none of the existing values apply, it should matchmessaging.operation.name
My preference is on the Option 1 side - I want to reevaluate why we need operation type, especially if we're considering having
messaging.operation.duration metric (and messaging.processing.duration), see #937
Also, with messaging.operation.name used in metrics, adding a new attribute to classify common operations would not be breaking (it won't increase number of time series), so even if discover that operation type is necessary in some cases, we should still be able to add it post-stability.
I would favor a solution where messaging.operation.type remains a required attribute.
Currently it's the only required attribute besides messaging.system, and thus important for making sense of messaging telemetry in a system-independent way. Otherwise, we'll end up with a situation where sometimes there's only operation.type, sometimes there's only operation.name, and sometimes there are both (depending on the system). This makes it harder to define workflows independent of the messaging system used.
I suggest:
4. Keep messaging.operation.type required, define a new value "undefined" to be used if none of the other values apply.
5. Keep messaging.operation.type required, define additional values in system specific conventions if none of the other values apply.
Currently it's the only required attribute besides messaging.system, and thus important for making sense of messaging telemetry in a system-independent way
Can we revisit this and clarify why is it important? How exactly is it going to be used?
Can we revisit this and clarify why is it important? How exactly is it going to be used?
It gives you a well-defined list of values which can be interpreted independently of the messaging system used. Regarding use-cases, for example it lets you ask questions like "give me spans for all settlement operations that caused an error" in a non-system specific way.
messaging.operation.type currently has five different values, it might end up having six if we add something like undefined, or seven if we add session and transaction as additional operation types. That's something one can grasp without too much cognitive overhead. Furthermore, it makes it clear what metric measures a given operation.
For messaging.operation.name there currently would be 22 different values, just for Azure and GCP Pub/Sub alone:
send, schedule, cancel_scheduled, create_transaction, commit_transaction, rollback_transaction, receive, peek, receive_deferred, renew_message_lock, abandon, complete, defer, dead_letter, accept_session, get_session_state, set_session_state, renew_session_lock, ack, nack, subscribe, modack
I'd be curious how others think about this, but for me that's a bit overwhelming without a categorization mechanism like the operation type provides it.
If folks think messaging.operation.type shouldn't be required (or be conditionally required only if messaging.operation.name isn't given), then I think we should remove it altogether and stick with one required attribute (messaging.operation.name), and systems can populate it how they want. It's in my opinion a better solution than having two attributes where each is conditionally required on the absence of the other.
Regarding use-cases, for example it lets you ask questions like "give me spans for all settlement operations that caused an error" in a non-system specific way.
Thanks, it makes total sense. So it's less of a span visualization problem and more a metrics (or custom queries) concern.
The concerns I still have:
- do I need to get all settlement operations in a non-system specific way? E.g. the default dashboards would show throughput/P95 latency/error rate for operation broken down by operation name. If they break things down by operation type, then all of the non-classified operations would be collapsed
- system-specific dashboards/queries should probably be fine without operation type
What I'm saying that the value operation.type provides does not seem to be essential, but most importantly, it can be added later at any point without breaking anything.
And I'm proposing to remove it altogether and make operation name required.
And I'm proposing to remove it altogether and make operation name required.
That's fair, let's discuss it in the next workgroup meeting.
I read everything and to me also makes sense that operation name is required. I like the idea of a high-level classification but I'm not sure if it really makes sense which probably is a good indication to leave it out for now.
The PR for gen_ai.operation.name has another interesting approach that uses namespaced values for the operation name, thus combining a type with a name.
For messaging, this could result in values like e. g. settle.ack, settle.nack, or publish.submit.
Based on the SIG discussion, we're tentatively aligned on removing messaging.operation.type for the time being and can re-introduce it later.
But it's blocked on https://github.com/open-telemetry/semantic-conventions/issues/937 since operation type is a part of metrics name.