Not clear to which messages a schema url applies to
Inside a single resource spans message, there are multiple occurrences of schema_url. As a consumer of OpenTelemetry data, I'm trying to understand to which attributes these schemas apply to. While this question applies to logs, traces and metrics, I'm going to reference the tracing related bits as this is where I initially stumbled across my problem.
- The
ResourceSpansmessage has its ownschema_urlfield: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L58-L63.
This schema_url applies to the data in the "resource" field. It does not apply to the data in the "scope_spans" field which have their own schema_url field.
This checks out.
- Following the comment going into the
scope_spansfield and looking at theScopeSpansmessage, it also has its ownschema_urlfield: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L76-L80.
This schema_url applies to all spans and span events in the "spans" field.
This kinda checks out but I'm missing two things, here:
- It doesn't talk about
linkswhich I guess it must apply to, too? - It apparently doesn't apply to the
scopewhich (a) seems inconsistent with how theResourceSpansschema_urlbehaves (applying toresource) and (b) leaves me wondering in which schema thescopeis recorded in
When I discovered this uncertainty, I followed the link in the source code comments which led me to this section. The ruleset established there doesn't seem to match with the comments in the proto file:
The schema_url field in the ResourceSpans [...] messages applies to the contained Resource, Span, SpanEvent [...] messages.
This clashes with the description given in the proto file.
The schema_url field in the InstrumentationLibrarySpans message applies to the contained Span and SpanEvent messages.
Checks out but with the same constraints I pointed out for the description in the proto file.
So without another explaining comment, I can't know for sure which field takes precedence as per this description they both cater to Span and SpanEvent.
Fortunately, there is an explanation there:
If schema_url field is non-empty both in Resource* message and in the contained InstrumentationLibrary* message then the value in InstrumentationLibrary* message takes the precedence.
But as @pellared already pointed out here:
this is most likely not correct
To stir more confusion, the link I mentioned has another section right at the top that reads as follows:
OpenTelemetry instrumentation libraries include the OpenTelemetry Schema URL in all emitted telemetry. This is currently work-in-progress, here is an example of how it is done in Go SDK’s Resource detectors.
As per @pellared:
this should have a different example from a instrumentation library and not a resource detector
To sum this issue up: As a consumer, it's not obvious to me on how to correctly process OTLP messages with regards to respecting the given schema urls. The docs as well as the spec seem to be outdated and/or wrong.
I'm happy to help rephrase some of the comments/docs I've linked but I need some guidance on what the correct way actually is before I get going.
Or maybe I've just navigated myself into a corner and it's totally obvious on how to process OTLP messages correctly. But I think in either case there are some takeaways from this issue.
Cheers
I'm happy to help rephrase some of the comments/docs I've linked but I need some guidance on what the correct way actually is before I get going.
yeah, it looks like there's some drift, and the spec needs to be updated a bit, if you can send a PR that would be a great way to get the ball rolling and get others to check it out!
To the best of my knowledge, there are the following errors:
-
Instrumentation library was renamed to scope. Anything that says
InstrumentationLibraryXlikeInstrumentationLibrarySpansshould beScopeXlikeScopeSpans. -
The
schema_urlfield in theResourceSpansmessage DOES NOT apply to the contained Span and SpanEvent messages. Theschema_urlinScopeSpansDOES. -
If schema_url field is non-empty both in Resource* message and in the contained InstrumentationLibrary* message then the value in InstrumentationLibrary* message takes the precedence.
I think this is just outdated
Additionally:
It doesn't talk about links which I guess it must apply to, too?
A linked span is just the span context to help you find the span in your backend. That span should have a resource from whatever ScopeSpans message is associated with it. A link is just a pointer. The linked span may be in the same ScopeSpans message, but may also not be.
It apparently doesn't apply to the scope which (a) seems inconsistent with how the ResourceSpans schema_url behaves (applying to resource) and (b) leaves me wondering in which schema the scope is recorded in
It is a bit inconsistent in the way the data is laid out, but it just avoids duplication because all spans in a scope should have the same schema url. The resource only has one set of attributes, so the resource and schema_url can be at the same level in the tree. Each scope can have many spans with their own unique attributes and events, which in turn may have attributes. Because all spans and span attributes within a scope should have the same schema_url, the url is at the scope level of the tree.
yeah, it looks like there's some drift, and the spec needs to be updated a bit, if you can send a PR that would be a great way to get the ball rolling and get others to check it out!
Sorry, I was working on other things. Will circle back to this and try to create a PR to clear up some confusion.
It is a bit inconsistent in the way the data is laid out, but it just avoids duplication because all spans in a scope should have the same schema url. The resource only has one set of attributes, so the resource and schema_url can be at the same level in the tree. Each scope can have many spans with their own unique attributes and events, which in turn may have attributes. Because all spans and span attributes within a scope should have the same schema_url, the url is at the scope level of the tree.
Thanks, that clarifies quite a bit!
A linked span is just the span context to help you find the span in your backend. That span should have a resource from whatever
ScopeSpansmessage is associated with it. A link is just a pointer. The linked span may be in the sameScopeSpansmessage, but may also not be.
Yes, I'm aware. I was referring to the fact that a link can be annotated with attributes, just like an event. E.g. see your own explanation:
Each scope can have many spans with their own unique attributes and events, which in turn may have attributes.
[...] which in turn may have attribute
This is also true for links. They can have their own attributes. Obviously they apply to the link and not the span the link points to.
My confusion stems from the fact that events get an explicit mention in the spec, while events get not:
The schema_url field in the InstrumentationLibrarySpans message applies to the contained Span and SpanEvent messages.
I assume this should include SpanLink, too, no?