semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Introduce span identity that can be used by the consumers

Open lmolkova opened this issue 6 months ago • 1 comments

Related to https://github.com/open-telemetry/opentelemetry-specification/issues/531, https://github.com/open-telemetry/oteps/pull/172

There is no reliable way to distinguish a specific span defined in a semconv group in a stream of spans. It limits the ability to validate, transform. or visualize spans.

Historically this problem is solved by having a short-list of well known attributes like db.system.name (http.request.method, messaging.system, rpc.system, gen_ai.system, faas.system, feature_flag.provider.name) and using the presence of such attribute as a sign that this span follows corresponding convention.

It does not solve the problem:

  • Every convention must define a required attribute that identifies it
  • Consumer needs to know upfront all possible attributes that are used for semconv identification.
  • When more than one span is defined within a convention (which is becoming common), distinguishing spans can only be done with more semconv specific heuristics and, in general case, is not possible.

Having a span identity exported along with the span would allow backends to avoid guess-game and provide a scaleable way to classify spans.

Span identity is de-facto the same as metric name or event name which uniquely identify the structure of a specific telemetry item within a stream of arbitrary items. Span name is dynamic and can't be used for this purpose.

How to define it:

  1. via instrumentation scope attribute: instrumentations will need to create a tracer per span identity. It minimizes the volume of data
  2. via span attribute: higher volume of data, a bit easier for instrumentations to follow
  3. via a new top-level property on the ScopeSpan or on the Span, similar to metric metadata

While span identity may be useful at the query time, it's arguably more of a metadata about spans. It might not be important for some backends and does not (?) have to be exposed at the query time, so having it on the scope level seems more appropriate. This assumption needs validation.

Span identity is tight to span group definition in semconv. If span does not have a definition in otel or external semconv, it's does not have an identity.

Q: Is it the same as span type/category? E.g. can we use span identity to say it's a messaging/faas/gen_ai/etc span?

A: No. We need more granular identification than category, but we should probably make id namespaced in a predictable way. E.g. messaging.publish, messaging.receive, messaging.settle, and messaging.process would be messaging spans. We can introduce messaging category as a separate thing and it would apply across signals and identify a convention area, but let's tackle it separately.

lmolkova avatar Apr 05 '25 18:04 lmolkova