opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Not clear to which messages a schema url applies to

Open lukasmalkmus opened this issue 1 year ago • 3 comments

Inside a single resource spans message, there are multiple occurrences of schema_url. As a consumer of OpenTelemetry data, I'm trying to understand to which attributes these schemas apply to. While this question applies to logs, traces and metrics, I'm going to reference the tracing related bits as this is where I initially stumbled across my problem.

  1. The ResourceSpans message has its own schema_url field: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L58-L63.

This schema_url applies to the data in the "resource" field. It does not apply to the data in the "scope_spans" field which have their own schema_url field.

This checks out.

  1. Following the comment going into the scope_spans field and looking at the ScopeSpans message, it also has its own schema_url field: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L76-L80.

This schema_url applies to all spans and span events in the "spans" field.

This kinda checks out but I'm missing two things, here:

  • It doesn't talk about links which I guess it must apply to, too?
  • It apparently doesn't apply to the scope which (a) seems inconsistent with how the ResourceSpans schema_url behaves (applying to resource) and (b) leaves me wondering in which schema the scope is recorded in

When I discovered this uncertainty, I followed the link in the source code comments which led me to this section. The ruleset established there doesn't seem to match with the comments in the proto file:

The schema_url field in the ResourceSpans [...] messages applies to the contained Resource, Span, SpanEvent [...] messages.

This clashes with the description given in the proto file.

The schema_url field in the InstrumentationLibrarySpans message applies to the contained Span and SpanEvent messages.

Checks out but with the same constraints I pointed out for the description in the proto file.

So without another explaining comment, I can't know for sure which field takes precedence as per this description they both cater to Span and SpanEvent.

Fortunately, there is an explanation there:

If schema_url field is non-empty both in Resource* message and in the contained InstrumentationLibrary* message then the value in InstrumentationLibrary* message takes the precedence.

But as @pellared already pointed out here:

this is most likely not correct

To stir more confusion, the link I mentioned has another section right at the top that reads as follows:

OpenTelemetry instrumentation libraries include the OpenTelemetry Schema URL in all emitted telemetry. This is currently work-in-progress, here is an example of how it is done in Go SDK’s Resource detectors.

As per @pellared:

this should have a different example from a instrumentation library and not a resource detector


To sum this issue up: As a consumer, it's not obvious to me on how to correctly process OTLP messages with regards to respecting the given schema urls. The docs as well as the spec seem to be outdated and/or wrong.

I'm happy to help rephrase some of the comments/docs I've linked but I need some guidance on what the correct way actually is before I get going.

Or maybe I've just navigated myself into a corner and it's totally obvious on how to process OTLP messages correctly. But I think in either case there are some takeaways from this issue.

Cheers

lukasmalkmus avatar Aug 06 '24 09:08 lukasmalkmus

I'm happy to help rephrase some of the comments/docs I've linked but I need some guidance on what the correct way actually is before I get going.

yeah, it looks like there's some drift, and the spec needs to be updated a bit, if you can send a PR that would be a great way to get the ball rolling and get others to check it out!

trask avatar Aug 13 '24 20:08 trask

To the best of my knowledge, there are the following errors:

  1. Instrumentation library was renamed to scope. Anything that says InstrumentationLibraryX like InstrumentationLibrarySpans should be ScopeX like ScopeSpans.

  2. The schema_url field in the ResourceSpans message DOES NOT apply to the contained Span and SpanEvent messages. The schema_url in ScopeSpans DOES.

  3. If schema_url field is non-empty both in Resource* message and in the contained InstrumentationLibrary* message then the value in InstrumentationLibrary* message takes the precedence.

    I think this is just outdated

Additionally:

It doesn't talk about links which I guess it must apply to, too?

A linked span is just the span context to help you find the span in your backend. That span should have a resource from whatever ScopeSpans message is associated with it. A link is just a pointer. The linked span may be in the same ScopeSpans message, but may also not be.

It apparently doesn't apply to the scope which (a) seems inconsistent with how the ResourceSpans schema_url behaves (applying to resource) and (b) leaves me wondering in which schema the scope is recorded in

It is a bit inconsistent in the way the data is laid out, but it just avoids duplication because all spans in a scope should have the same schema url. The resource only has one set of attributes, so the resource and schema_url can be at the same level in the tree. Each scope can have many spans with their own unique attributes and events, which in turn may have attributes. Because all spans and span attributes within a scope should have the same schema_url, the url is at the scope level of the tree.

dyladan avatar Aug 13 '24 20:08 dyladan

yeah, it looks like there's some drift, and the spec needs to be updated a bit, if you can send a PR that would be a great way to get the ball rolling and get others to check it out!

Sorry, I was working on other things. Will circle back to this and try to create a PR to clear up some confusion.

It is a bit inconsistent in the way the data is laid out, but it just avoids duplication because all spans in a scope should have the same schema url. The resource only has one set of attributes, so the resource and schema_url can be at the same level in the tree. Each scope can have many spans with their own unique attributes and events, which in turn may have attributes. Because all spans and span attributes within a scope should have the same schema_url, the url is at the scope level of the tree.

Thanks, that clarifies quite a bit!

A linked span is just the span context to help you find the span in your backend. That span should have a resource from whatever ScopeSpans message is associated with it. A link is just a pointer. The linked span may be in the same ScopeSpans message, but may also not be.

Yes, I'm aware. I was referring to the fact that a link can be annotated with attributes, just like an event. E.g. see your own explanation:

Each scope can have many spans with their own unique attributes and events, which in turn may have attributes.

[...] which in turn may have attribute

This is also true for links. They can have their own attributes. Obviously they apply to the link and not the span the link points to.

My confusion stems from the fact that events get an explicit mention in the spec, while events get not:

The schema_url field in the InstrumentationLibrarySpans message applies to the contained Span and SpanEvent messages.

I assume this should include SpanLink, too, no?

lukasmalkmus avatar Feb 07 '25 11:02 lukasmalkmus