opentelemetry-specification
opentelemetry-specification copied to clipboard
Better structure for span identification
Consider these canonical examples:
Span Name | Guidance |
---|---|
get_account | Good, and account_id=42 would make a nice Span attribute |
get_account/{accountId} | Also good (using the "HTTP route") |
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-tracing.md#span
These, are quite frankly terrible identifications for a span. The headliner information doesn't give me a clue whether I'm looking an HTTP request/response, an RPC call, a database procedure/query, a cloud function, a cache lookup, an internal computation etc.
The trace itself isn't the best particular example, but consider Datadog's tracing interface:
There is both:
-
A span "type" that is instrumentation-determined (Datadog vocab: "name" or "operation").
-
http.request
-
mongodb.query
-
lambda.invocation
-
grpc.call
-
java.function
-
-
A span "name" that is application-determined (Datadog vocab: "resource").
-
get_account
-
/users/{id}
-
SELECT * FROM pokedex
-
com.example.Thing.run
.
-
Whether this is done as syntax in the span name (TYPE:NAME
), or whether as attribute (type: TYPE
, component: TYPE
), there should be some standard method of assigning classification.
Otherwise, I wind up with spans auto-named "get_account," all of wildly different flavors (HTTP, RPC, Message Queue, DB), and I'm left trying to tell them apart. Naturally, with enough inspection into attributes that is possible, but there are a lot of attributes to look through (a high level view of trace usually doesn't show them due to their number).
(Note I am not talking about tracer name, which is refers to the instrumentation. I am talking about either the instrumented technology, or type of operation.)
I believe this overlaps with #271, though there is little recorded discussion, so I'm not entirely sure what happened.
Backends should already be able to deduce the type of action using the attributes added to spans according to the semantic conventions defined in this spec. The easiest way would be a if-else cascade checking for the presence of mandatory attributes in a certain order (db.type
, messaging.system
, http.method
, rpc.service
, ...). Therefore, the problem you mentioned is merely one of the backend and not the data model itself.
PS: Does the color coding in your screenshot correspond to the types ("names") that you list below or how can you tell which one it is?
@arminru dd trace color scheme is either by service or host depending on settings.
Backends should already be able to deduce the type of action using the attributes added to spans according to the semantic conventions defined in this spec. The easiest way would be a if-else cascade checking for the presence of mandatory attributes in a certain order
And exactly what backend would you recommend for this?
I really don't get the hesitation to make spans have properly a discriminated type.
@pauldraper
And exactly what backend would you recommend for this?
Well I work on a backend where we do it this way, so my recommendation would most probably be strongly biased at least. 😄
I really don't get the hesitation to make spans have properly a discriminated type.
The type you'd like to have added had actually been there in the past, it was named component
but removed in #271. Unfortunately, the rationale for the decision was not really properly documented on the issue nor in the meeting notes. Based on the description provided by @yurishkuro, who opened the issue, I'd say it was removed for the motivation he stated - the fact that component
is redundant since the type/kind of span can be inferred by looking at (required) span attributes as I also mentioned above (https://github.com/open-telemetry/opentelemetry-specification/issues/531#issuecomment-606036462).
Apart from that, at the time the issue was opened, component
was not well-specified. For database spans, for example, component
was not defined as a fixed string "db"
or "database"
but rather an unbounded, free text value as initially criticized in #245 (title was reworded after component was removed):
component
: Database driver name or database name (when known) "JDBI", "jdbc", "odbc", "postgreSQL".
This definition would not have been of any help for the purposes you described but could've been fixed as well, of course, rather than removing component
entirely.
the fact that component is redundant
Not really. Without it, you need to add information (an algorithm for deducing type) that you wouldn't otherwise.
This definition would not have been of any help for the purposes you described but could've been fixed
It certain would help. I don't need it to necessarily be standardized. I just need unique operation names.
The canonical examples of good span names are get_account
and get_account/{accountId}
.
I have no earthy idea which of the various flavors of "get_account" I have in my stack: database, HTTP, in-process function, cloud function, AMP message? I don't necessary need a perfectly uniform component classification scheme, but I do need to tell the HTTP request get_account
apart from the database query get_account
apart when they show up in report, list, etc. And tacking on 20 attributes of every possible kind to achieve that uniqueness isn't wieldy.
Now, perhaps the specification just has really, really bad examples of span names. Maybe the good span names would be HTTP:get_account
, JDBI:get_account
, etc. I don't care whether it's an attribute or span name prefix; I just want to tell my operations apart, and currently the spec seems to do a very bad job of that.
Programming a backend in order to that basic thing...that seems unnecessarily complex and poorly supported.
Since you complain about span name, I think it currently has an unclear purpose, see related issue #557.
Hi @pauldraper, I've taken a shot at resolving some of this issues raised in this thread and others here (https://github.com/open-telemetry/opentelemetry-specification/pull/730), by adding display hints. Please take a look.
I suggest to remove release:required-for-ga
label.
The "component" approach was already discussed and rejected in the past. The type of the Span can be deduced by the presence of required attributes. It may not be convenient but it is possible. It is also more powerful since it allows to record multiple types simultaneously while a single "type" or "component" does not (what is the type of a Span representing an HTTP call to a database? Is it "http" or "db"?).
It is likely too late for 1.0 to introduce a new way of specifying the Span type that is better than what we have. The are likely better ways but I don't think we have time to introduce, discuss and agree on an approach quickly enough to make it part of 1.0 release.
+1 on making this release:after-ga
.
From the issue triage mtg today, i'm changing the label to release:after-ga
since it looks like from the comments this can be punted.
1. Does anyone use Datadog? Or am I the only user of the largest commercial monitoring platform?
Because I don't see how Otel is going to work with Datadog using the it can intelligently produce an operation and resource ("type" and "name").
2. Does anyone thing these trace names are actually good?
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span
Like what the heck are they....a file, a GRPC operation, a HTTP request, a DB query, something else?
Not at all obvious.
Like what the heck are they....a file, a GRPC operation, a HTTP request, a DB query, something else?
Not at all obvious.
The details are specified in the semantic conventions: https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/trace/semantic_conventions
Because I don't see how Otel is going to work with Datadog using the it can intelligently produce an operation and resource ("type" and "name").
All semantic conventions should have a "marker attribute" or at least a set thereof. E.g. a database operation can be identified by having a db.system
span attribute, an HTTP span always has http.method
, etc. (but see #653)
Providing some feedback as a current DataDog user (platform to manage traces in my company), Jaeger user (testing locally)and trying to manage a way in my company to standardize not only spans that have defined usecases (http/grpc request, db, SQS, lambda, etc), but also private custom conventions inside of my company.
I agree and understand that db.system
or http.method
can be used to identify a span "type of action", "component", "type" or however we want to call it. However, as Paul comments, just checking if this tag exists it is not enough.
We might add new tags to the spec or deprecate some. Technology evolves and for sure we will need to add or remove metadata in spans. However, it is not feasible to operate on them if we do not know which spec we are targeting to.
Thus, is where it makes sense to have a component, type or any kind of type identifying not only the type of the span, but also the version of the schema we are mapping it to! This will simplify parsers, make it easier for users to identify which kind of data they have available and also upgrading queries to support new standardized semantic conventions or potential additions in the future.
Internally in my company I'm working to define span schemas for different similar types of spans that map to business logic. These are totally independent from the semantic conventions defined in oTel, but we still have similar challenges. We are trying to adapt and implement the different tags whenever is possible or relevant for that business logic span. However we are iterating on it, and we are versioning them, so we end with payments-v2
or identity-v4
(as an example), and we know which is the expected structure and tags that we will have in each span.
Otherwise, there is no magic way to understand how spans will change in the future, and of course, it makes it really hard for processors to identify them or understand which kind of span we are looking at. The only option, as Paul says is to make a crazy algorithm, that for sure will have issues when doing changes at the schema that will try to identify the type of span.
Maybe it is not a blocker for a ga
release, but it is definitely a must how oTel will version the changes in the semantic conventions (which maybe should be defined (or are) as schemas?) to make sure that in 1-2 years (of convention changing) we know of which kind of http/db/queue, etc metadata we are speaking about.
Thus, is where it makes sense to have a component, type or any kind of type identifying not only the type of the span, but also the version of the schema we are mapping it to! This will simplify parsers, make it easier for users to identify which kind of data they have available and also upgrading queries to support new standardized semantic conventions or potential additions in the future.
@Sturgelose The version of the spec the span conforms to is already possible to include. SchemaURL can be included the emitted telemetry. See schemas: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/schemas/overview.md
Schemas are also how the evolution of the conventions is supposed to be handled. See https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#semantic-conventions-stability
Does this address your concerns?