semantic-conventions Messaging.operation.type attribute should not be required

trafficstars

Operation type identifies common messaging operations (which may be called differently in specific systems): e.g. publish can be called send, schedule, or enqueue.

In such cases we use messaging.operation.name to represent system-friendly name.

However there are other types of operations that don't match any common type. E.g, cancel scheduled message, delete, create topic, etc...

While those are not targeted for stability, we should make sure messaging.operation.type requirement level is adequate and future-proof.

Originally posted by @lmolkova in https://github.com/open-telemetry/semantic-conventions/pull/942#discussion_r1575412911

Apr 22 '24 22:04 lmolkova

Options:

Don't populate messaging.operation.type at all. We have span kind and operation name to distinguish between different operations. We only need it to distinguish operations within the same kind (both create and publish are producers and both receive and process are consumers), but is it really necessary in presence of the operation name?
Make messaging.operation.type conditionally_required: when applicable
Keep messaging.operation.type required, but then tell that if none of the existing values apply, it should match messaging.operation.name

Apr 22 '24 22:04 lmolkova

My preference is on the Option 1 side - I want to reevaluate why we need operation type, especially if we're considering having messaging.operation.duration metric (and messaging.processing.duration), see #937

Also, with messaging.operation.name used in metrics, adding a new attribute to classify common operations would not be breaking (it won't increase number of time series), so even if discover that operation type is necessary in some cases, we should still be able to add it post-stability.

Apr 22 '24 22:04 lmolkova

I would favor a solution where messaging.operation.type remains a required attribute.

Currently it's the only required attribute besides messaging.system, and thus important for making sense of messaging telemetry in a system-independent way. Otherwise, we'll end up with a situation where sometimes there's only operation.type, sometimes there's only operation.name, and sometimes there are both (depending on the system). This makes it harder to define workflows independent of the messaging system used.

I suggest: 4. Keep messaging.operation.type required, define a new value "undefined" to be used if none of the other values apply. 5. Keep messaging.operation.type required, define additional values in system specific conventions if none of the other values apply.

Apr 23 '24 09:04 pyohannes

Currently it's the only required attribute besides messaging.system, and thus important for making sense of messaging telemetry in a system-independent way

Can we revisit this and clarify why is it important? How exactly is it going to be used?

Apr 23 '24 15:04 lmolkova

Can we revisit this and clarify why is it important? How exactly is it going to be used?

It gives you a well-defined list of values which can be interpreted independently of the messaging system used. Regarding use-cases, for example it lets you ask questions like "give me spans for all settlement operations that caused an error" in a non-system specific way.

messaging.operation.type currently has five different values, it might end up having six if we add something like undefined, or seven if we add session and transaction as additional operation types. That's something one can grasp without too much cognitive overhead. Furthermore, it makes it clear what metric measures a given operation.

For messaging.operation.name there currently would be 22 different values, just for Azure and GCP Pub/Sub alone:

send, schedule, cancel_scheduled, create_transaction, commit_transaction, rollback_transaction, receive, peek, receive_deferred, renew_message_lock, abandon, complete, defer, dead_letter, accept_session, get_session_state, set_session_state, renew_session_lock, ack, nack, subscribe, modack

I'd be curious how others think about this, but for me that's a bit overwhelming without a categorization mechanism like the operation type provides it.

If folks think messaging.operation.type shouldn't be required (or be conditionally required only if messaging.operation.name isn't given), then I think we should remove it altogether and stick with one required attribute (messaging.operation.name), and systems can populate it how they want. It's in my opinion a better solution than having two attributes where each is conditionally required on the absence of the other.

Apr 24 '24 08:04 pyohannes

Regarding use-cases, for example it lets you ask questions like "give me spans for all settlement operations that caused an error" in a non-system specific way.

Thanks, it makes total sense. So it's less of a span visualization problem and more a metrics (or custom queries) concern.

The concerns I still have:

do I need to get all settlement operations in a non-system specific way? E.g. the default dashboards would show throughput/P95 latency/error rate for operation broken down by operation name. If they break things down by operation type, then all of the non-classified operations would be collapsed
system-specific dashboards/queries should probably be fine without operation type

What I'm saying that the value operation.type provides does not seem to be essential, but most importantly, it can be added later at any point without breaking anything.

And I'm proposing to remove it altogether and make operation name required.

Apr 24 '24 19:04 lmolkova

And I'm proposing to remove it altogether and make operation name required.

That's fair, let's discuss it in the next workgroup meeting.

Apr 29 '24 12:04 pyohannes

I read everything and to me also makes sense that operation name is required. I like the idea of a high-level classification but I'm not sure if it really makes sense which probably is a good indication to leave it out for now.

Apr 30 '24 09:04 joaopgrassi

The PR for gen_ai.operation.name has another interesting approach that uses namespaced values for the operation name, thus combining a type with a name.

For messaging, this could result in values like e. g. settle.ack, settle.nack, or publish.submit.

May 02 '24 13:05 pyohannes

Based on the SIG discussion, we're tentatively aligned on removing messaging.operation.type for the time being and can re-introduce it later. But it's blocked on https://github.com/open-telemetry/semantic-conventions/issues/937 since operation type is a part of metrics name.

May 02 '24 20:05 lmolkova

semantic-conventions semantic-conventions copied to clipboard

Messaging.operation.type attribute should not be required

semantic-conventions
semantic-conventions copied to clipboard