oteps icon indicating copy to clipboard operation
oteps copied to clipboard

Propose Context-scoped attributes.

Open Oberon00 opened this issue 3 years ago • 53 comments
trafficstars

Add Context-scoped telemetry attributes which typically apply to all signals associated with a trace as it crosses a single service.

This OTEP aims to address various related demands that have been brought up in the past, where the scope of resource attributes is too broad, but the scope of span attributes is too narrow. For example, this happens where there is a mismatch between the OpenTelemetry SDK’s (and thus TracerProvider’s, MeterProvider’s) process-wide initialization and the semantic scope of a (sub)service.

The context-scoped attributes allows you to attach attributes to all telemetry signals emitted within a Context (or, following the usual rules of Context values, any child context thereof unless overridden). Context-scoped attributes are normal attributes, which means you can use strings, integers, floating point numbers, booleans or arrays thereof, just like for span or resource attributes. Context-scoped attributes are associated with all telemetry signals emitted while the Context containing the Context-scoped attributes is active and are available to telemetry exporters. For spans, the context within which the span is started applies. Like other telemetry APIs, Context-scoped attributes are write-only for applications. You cannot query the currently set Context-scoped attributes, they are only available on the SDK level (e.g. to telemetry exporters, Samplers and SpanProcessors).

Context-scoped attributes should be thought of equivalent to adding the attribute directly to each single telemetry item it applies to.


Rendered: https://github.com/dynatrace-oss-contrib/oteps/blob/feature/context-scoped-attributes/text/0207-context-scoped-attributes.md Note: A table of contents can be found under the icon at the top left corner, if you go to the rendered view.

Oberon00 avatar Jun 13 '22 16:06 Oberon00

(Note that the check-errors step seems stuck at "Waiting for status to be reported")

Oberon00 avatar Jun 14 '22 07:06 Oberon00

There are many possible design variations on these attributes. What you suggest as workaround with "setting the attributes on the span and then on the Context" could maybe be avoided with Alternative: Associating Context-scoped attributes with the span on end and setting attributes on parent Contexts, in addition to generally using the span as primary/only storage as you say (I think this is Alternative: Associating attributes indirectly through spans). Then you would just set the attributes on the Context and activate it, and they would be associated with the span on end. Less work, but also less intuitive and the attributes are not available to samplers / OnStart callbacks.

Oberon00 avatar Jun 15 '22 09:06 Oberon00

Mentioned briefly in spec meeting today. @carlosalberto suggests to implement a prototype in some language.

I think for spans this should even be doable without changing core API/SDK code, just by using the SpanProcessor's OnStart callback.

Oberon00 avatar Jun 21 '22 16:06 Oberon00

In Ruby, we have implemented a similar mechanism using Context to propagate attributes to HTTP client spans. We've also implemented this for Redis client spans.

An example use from the Koala instrumentation, which instruments a Facebook API client, is:

          def graph_call(path, args = {}, verb = 'get', options = {}, &post_processing)
            OpenTelemetry::Common::HTTP::ClientContext.with_attributes('peer.service' => 'facebook', 'koala.verb' => verb, 'koala.path' => path) do
              super
            end
          end

Faraday is a HTTP client library. It's instrumentation merges in the attributes from the HTTP::ClientContext with:

          def span_creation_attributes(http_method:, url:)
            instrumentation_attrs = {
              'http.method' => http_method,
              'http.url' => url.to_s,
              'net.peer.name' => url.host
            }
            config = Faraday::Instrumentation.instance.config
            instrumentation_attrs['peer.service'] = config[:peer_service] if config[:peer_service]
            instrumentation_attrs.merge!(
              OpenTelemetry::Common::HTTP::ClientContext.attributes
            )
          end

Honeycomb has also implemented a similar mechanism in their "hello world" examples, called CarryOn

  module CarryOn
    extend self


    # the key under which attributes will be stored for CarryOn within a context
    CONTEXT_KEY = ::OpenTelemetry::Context.create_key('carry-on-attrs')
    private_constant :CONTEXT_KEY


    # retrieve the CarryOn attributes from a given or current context
    def attributes(context = nil)
      context ||= ::OpenTelemetry::Context.current
      context.value(CONTEXT_KEY) || {}
    end


    # return a new Context with the attributes given set within CarryOn
    def with_attributes(attributes_hash)
      attributes_hash = attributes.merge(attributes_hash)
      ::OpenTelemetry::Context.with_value(CONTEXT_KEY, attributes_hash) { |c, h| yield h, c }
    end
  end


  # The CarryOnSpanProcessor reads attributes stored under a custom CarryOn key
  # in the span's parent context and adds them to the span.
  #
  # Add this span processor to the pipeline and then keys and values
  # added to CarryOn will appear on subsequent child spans for a trace only
  # within this service. CarryOn attributes do NOT propagate outside this process.
  class CarryOnSpanProcessor < OpenTelemetry::SDK::Trace::SpanProcessor
    def on_start(span, parent_context)
        span.add_attributes(CarryOn.attributes(parent_context))
    end
  end

fbogsany avatar Jun 21 '22 16:06 fbogsany

Yes, CarryOn looks exactly like what this proposes (though for Spans only). Does that satisfy the prototype condition, @carlosalberto?

Oberon00 avatar Jun 28 '22 14:06 Oberon00

(close & reopen because CI still seems stuck)

Oberon00 avatar Jun 28 '22 14:06 Oberon00

G'day! ~~I suspect the name Scoped Resources might better match your intent. The word "context" strongly suggests the execution path to Go programmers, and "attributes" suggests to me a shorter lifespan than you've proposed for these key-value pairs eg. the name of a sub-service for the lifetime of the process.~~

Separately, I can't see how this proposal satisfies the needs for which I proposed a BeforeEnd callback, eg. adding a span attribute for the CPU work done by the thread during the span, or recording ambient but changing information about the system eg. RAM usage. Perhaps I'm missing something. Could you clarify how your proposal would satisfy those use cases?

garthk avatar Jul 02 '22 01:07 garthk

I had a nagging thought I'd somehow mis-read the spec, came back, and confirmed it. I think I got confused by the scope attributes comparison, but that's on me; I have no suggested edits for clarity. I've struck out my first paragraph above. Do set me straight if I've similarly blundered on the second.

Under OTLP protocol changes, I'd prefer not to treat these attributes as resources. That might seem appropriate for the http.app example but I'm more tempted to use this feature for enduser.id et al.

garthk avatar Jul 02 '22 08:07 garthk

eg. adding a span attribute for the CPU work done by the thread during the span, or recording ambient but changing information about the system eg. RAM usage. Perhaps I'm missing something. Could you clarify how your proposal would satisfy those use cases?

It doesn't satisfy these, the BeforeEnd callback would still be useful for these. But it solves problems closely related to the original problem the BeforeEnd callback was proposed for, namely attaching attributes to (nearly) all spans of a trace. I also doesn't do that 100% in the way the original author of https://github.com/open-telemetry/opentelemetry-specification/issues/1089 proposed (solutions closer to that are discussed in the alternatives section of the OTEP though).

Oberon00 avatar Jul 05 '22 09:07 Oberon00

Under OTLP protocol changes, I'd prefer not to treat these attributes as resources.

Right, that doesn't make sense. You probably read one of the alternatives, because the main text suggests to implement the attributes such that the Tracers/Meters/... would just attach the attributes directly to the Spans/Metrics as normal attributes, so that exporters won't even be able to tell the difference, so there is no protocol change required. Merging context-scoped attributes with resources is only mentioned as alternative implementation to an optional addition to an alternative 😃

Oberon00 avatar Jul 05 '22 09:07 Oberon00

Indeed I did, and I endorse your conclusion there. Good luck with your OTEP!

garthk avatar Jul 06 '22 08:07 garthk

For the record, Lightstep internally uses a span+metrics instrumentation library that predates OpenTelemetry. In that library we have a Scope construct that is not quite like any of the ones you described, but substantially similar. Briefly described:

A primitive method exists to start a new Span/Scope. The initial Scope variables are configured as attributes of a new Span. During the span's operation, any metric operation automatically inherits the attributes of the scope.

Like the proposal here, these attributes are independent of baggage context, which propagate unconditionally. We did not automatically translate scope attributes to internal, non-service-boundary-crossing span contexts, although a study of the internal code suggests that most of the time, this would be desirable.

There is an Scope.Add() operation to append new scope attributes--new scope attributes are added to the span when it ends, and they impact future metric operations in the current scope. This is somewhat like Associating Context-scoped attributes with the span on end and setting attributes on parent Contexts.

jmacd avatar Jul 13 '22 18:07 jmacd

@bogdandrutu

Baggage is both readable and writable for the application, while execution-scoped attributes, like all other telemetry attributes, are write-only.

Clearly baggage is a super set of this proposal.

That's the "Rectangle extends Square"-kind of flawed superset. I believe, we intentionally designed Span as write-only for the application to not make applications rely on data added to the span. The same issue exists with Baggage vs. Context-scoped attributes.

Baggage only supports string values.

Do we really care about other values?

I do, and I believe many others will. Besides, having limited attribute types would simply be inconsistent, and may cause issues e.g. with defining semantic conventions.

Baggage is not available to telemetry exporters (although e.g., a SpanProcessor could be used to change that), while that's the whole purpose of execution-scoped attributes.

This is a long time request, and can be added.

I guess it can. I would say, defining the precise association (the attributes at which point counts for a Span for example) is maybe the main issue of this proposal. If we define such an association between telemetry items and baggage, and also allow non-propagation and access from span processors, and we add additional metadata to define which of the baggage attributes should be copied to telemetry item's attributes (as we don't want to send potentially sensitive application data to telemetry backends), and if we can live with leaking telemetry data back to applications, then we don't need the additional Context-scoped-attributes concept.

All in all, it seems kind of forced to me to burden the baggage spec with this additional use case. In total it may be more complicated than adding this new context-scoped attributes concept.

Oberon00 avatar Aug 04 '22 12:08 Oberon00

I agree that baggage is not the solution to the problem presented, because baggage is propagated between processes, while context stays within a process.

But I don't like the proposal for a different reason. Context already supports typed attributes. This PR adds a mechanism to elevate some of them to be force-included in the telemetry. I'd argue that it's not the instrumentation's job to make such decision, but a job for configuration mechanism. The same mechanism can select baggage attributes to affect telemetry, if desired.

yurishkuro avatar Aug 04 '22 13:08 yurishkuro

@yurishkuro

This PR adds a mechanism to elevate some of them to be force-included in the telemetry. I'd argue that it's not the instrumentation's job to make such decision

Interesting, I don't see this PR like that at all. In my view, this PR adds a mechanism for instrumentations to add telemetry attributes that are scoped not to the whole application (like resources) an instrumentation scope, or a single item, but to a context. The use case I used in the diagram is a very concrete one for example, that I think could be included in a Java web server / servlet instrumentation. Making something configurable is always possible (e.g. also for span attributes) but might not always make sense.

Context already supports typed attributes. This PR adds a mechanism to elevate some of them to be force-included in the telemetry.

Maybe I should add a section to distinguish this from Context as well. Some points that I can come up with right now:

  • Context keys are not strings but unique objects returned by CreateKey
  • Context values are not attributes but arbitrary objects
  • Contexts are not enumerable
  • Contexts entries are meant to be set & interpreted in-process for particular use cases depending on the entry. They are not "key-value pairs"/attributes (see also previous point)
  • OpenTelemetry recommends language library authors use existing Context concepts instead of creating an otel-specific one. So even if you could enumerate the context, there could be arbitrary not telemetry related stuff there.

So what we could add would be a an API to enlist certain context keys as attributes to be set on telemetry items (if available). This would nearly work as a behind-the-scenes change to the currently proposed API, except that you would need to pre-allocate the keys with a new API, and use the key objects in the AddContextScopedAttributes call. That seems like a more awkward API to support fewer use cases with a more complex implementation.

Oberon00 avatar Aug 04 '22 15:08 Oberon00

This PR adds a mechanism to elevate some of them to be force-included in the telemetry.

Interesting, I don't see this PR like that at all. In my view, this PR adds a mechanism for instrumentations to add telemetry attributes that are scoped not to the whole application (like resources) an instrumentation scope, or a single item, but to a context.

I think you're saying the same thing. The key part in your statement is "telemetry attributes". If this was just about "attributes" that stay in the context only within the process, then it's the definition of the Context. When you add "telemetry" I read it as "force-included in the telemetry".

Maybe I should add a section to distinguish this from Context as well. Some points that I can come up with right now:

No objections to this, but none of these features are required for the goal that you're after. String keys are also valid keys for the context. If we agree that force-including attributes in telemetry is bad and it should be an opt-in configuration, then the existing Context seems perfectly adequate for the goal. The configuration can list a collection of attribute keys, which, if found in the Context, should be included in the telemetry. Lack of enumerability and your other points about the Context do not prevent doing that.

yurishkuro avatar Aug 05 '22 18:08 yurishkuro

@yurishkuro

The key part in your statement is "telemetry attributes"

Indeed. The general context is not meant for (directly storing) telemetry attributes.

The configuration can list a collection of attribute keys, which, if found in the Context, should be included in the telemetry. Lack of enumerability and your other points about the Context do not prevent doing that.

I described how that could look like in my last comment, and as said I believe it would be possible but "a more awkward API to support fewer use cases with a more complex implementation". I can see no advantage at all. Configurability could be later added as an add-on to the currently proposed API as well.

If we agree that force-including attributes in telemetry is bad and it should be an opt-in configuration

No, I don't agree to that and I don't understand what you mean by "force" here (EDIT: So maybe I don't agree because I don't fully get the motivation here). It's "forced" in the same way that span attributes are force-included into exported telemetry data.

Oberon00 avatar Aug 09 '22 09:08 Oberon00

Regarding Baggage (continuing from https://github.com/open-telemetry/oteps/pull/207#issuecomment-1205185260) @bogdandrutu

I thought about it again and I think that depending on the language/client library and the future evolution of baggage, a baggage-based implementation of Context-scoped attributes might indeed make sense. I added that in the alternatives section in 94d344d

Oberon00 avatar Aug 11 '22 16:08 Oberon00

@Oberon00 my main worry here is that we are confusing the users more with adding more concepts. I may be the only one who has this concern, which in that case I am an outlier, but personal preference is to avoid adding new concepts, when we can work on extending already existing once by enhancing them.

bogdandrutu avatar Aug 19 '22 14:08 bogdandrutu

@Oberon00 my main worry here is that we are confusing the users more with adding more concepts. I may be the only one who has this concern, which in that case I am an outlier, but personal preference is to avoid adding new concepts, when we can work on extending already existing once by enhancing them.

I echo this. My first comments was because of the confusion I had.

tigrannajaryan avatar Aug 19 '22 15:08 tigrannajaryan

I agree with @bogdandrutu and @tigrannajaryan .

No, I don't agree to that and I don't understand what you mean by "force" here (EDIT: So maybe I don't agree because I don't fully get the motivation here). It's "forced" in the same way that span attributes are force-included into exported telemetry data.

@Oberon00 These scenarios are not equivalent. Span attributes exist for users to store values in them, that's not "forcing". And by recording an attribute in the span I do not affect other telemetry signals simultaneously. But these context-scoped attributes are meant to affect all telemetry, which is why I refer to them as "forced". Say I import a library that decided to add some context-scoped attributes - suddenly all my telemetry gains extra dimensions, especially the metrics where the cardinality has direct correlation on the ingestion/storage costs.

I guess another way to frame it is like this: the PR proposes an opt-out mechanism, and I prefer opt-in. Context already supports attributes, anyone can add them. An opt-in approach would be a configuration mechanism where the service owner can say "I want these context attributes to be automatically included into these types of telemetry". It would extend the SDK configuration surface, but keep the OTEL API surface (the Context API) intact.

yurishkuro avatar Aug 19 '22 21:08 yurishkuro

@yurishkuro OK, I think I understand your concern with metrics & configurability. The proposed Context-scoped attributes API is giving libraries great power with which comes great responsibility. Adding configurability allows the user to share that responsibility and work around broken instrumentation libraries that add too many attributes.

I do think that the Context API is not well-suited for this as-is though. Configurability is independent from the API used to supply the attributes.

What could be made configurable? Thinking out loud:

  • Either an include-list or exclude-list of attribute keys (user should have the choice which one to use; exclude-list is only possible if attributes are marked as candidates for being telemetry-attributes)
  • There should be a default include/exclude-list and an optional per-signal list that work together (i.e. attributes are filtered by the default list and then after that by the per-signal list).
  • An empty include-list should be possible and mean that the feature is disabled (generally, or for particular signals)
  • Read-access for a custom span/other signal processor could be added so that it could do more complex logic for picking attributes, which would make sense in combination with restrictive include/exclude lists.

I think this is all a nice-to-have though. What would be easier & possible with the currently proposed API: Recommend all libraries (1) use context-scoped attributes very sparingly (2) provide a way of disabling them or even make them opt-in if they are not critical.

I.e., I would rather add the configurability to the libraries and not the SDK, this is also what we have done with span attributes with the Opt-in level and partially the Recommended level in #2522.

And by recording an attribute in the span I do not affect other telemetry signals simultaneously. But these context-scoped attributes are meant to affect all telemetry, which is why I refer to them as "forced".

Actually I think that just spans would be most important to me. I just thought that affecting all telemetry signals would give the most logical & easy concept of "Context-scoped attributes". But I would be OK with defining them to just affect spans if that would be considered less "forced". Would this be preferable?

Oberon00 avatar Aug 22 '22 10:08 Oberon00

@bogdandrutu

my main worry here is that we are confusing the users more with adding more concepts

OK. But wouldn't two simple concepts and one complex concept be about the same mental overhead in this case?

Actually, I just now looked at the Baggage spec and found it described as:

Baggage is used to annotate telemetry, adding context and information to metrics, traces, and logs.

But is that actually true today? That would be exactly the use case here, but I don't think it's actually covered.

Oberon00 avatar Aug 22 '22 10:08 Oberon00

Baggage is used to annotate telemetry, adding context and information to metrics, traces, and logs.

I think ^ is wrong and quite different from W3C Baggage definition.

yurishkuro avatar Aug 22 '22 16:08 yurishkuro

suddenly all my telemetry gains extra dimensions, especially the metrics where the cardinality has direct correlation on the ingestion/storage costs.

Let's not keep repeating this point. This "inflation of cardinality effect" should not be the case.

It's certainly not the case for Prometheus and OpenMetrics-based systems, and the reason is that creating a new scope attribute adds one attribute to one scope, not to every piece of telemetry in the scope. We are still working out the specification, but as shown in https://github.com/open-telemetry/opentelemetry-specification/pull/2703, users are meant to join scope attributes with metrics if they want to at query time. The PromQL group_left operation (e.g.,) specifically supports this expansion, so we do not need to worry about cardinality inflation due to scope attributes; this follows from the OpenMetrics-documented approach to handling target information (which must not inflate cardinality).

Baggage is used to annotate telemetry, adding context and information to metrics, traces, and logs. I think ^ is wrong and quite different from W3C Baggage definition.

Exactly, this is why the work @Oberon00 is doing is important. Somehow, we settled on APIs where there is no OpenTelemetry way to control telemetry attributes scoped to a span or a group of spans and the logs and metrics that happen within. This appears to be a common pattern, and we should try to standardize something to help users IMO.

jmacd avatar Aug 22 '22 18:08 jmacd

I.e., I would rather add the configurability to the libraries and not the SDK, this is also what we have done with span attributes with the Opt-in level and partially the Recommended level in #2522.

@Oberon00 in contrast, I would much rather add it to SDK once as a standard mechanism than to learn different configurations for different libraries that might decide to spam the context with undesired attributes.

Somehow, we settled on APIs where there is no OpenTelemetry way to control telemetry attributes scoped to a span or a group of spans and the logs and metrics that happen within. This appears to be a common pattern, and we should try to standardize something to help users IMO.

@jmacd This isn't an argument that helps choosing between this proposal and the SDK configuration alternative - they both end up resolving the user issue.

yurishkuro avatar Aug 22 '22 19:08 yurishkuro

@yurishkuro you cannot solve this problem with just configuration. You still need an API to supply these attributes. For me the question is only if we extend the baggage API or add a new API. I don't think the context API is suitable for this. And I also don't think the baggage API can be used as-is, we at least need a baggage API that excludes an attribute from the actual baggage. Otherwise, in the absence of configuration, we would fill the the baggage with telemetry attributes that the user sees when enumerating the baggage and that would be propagated.

@jmacd

the reason is that creating a new scope attribute adds one attribute to one scope, not to every piece of telemetry in the scope.

This proposal suggests to add the attribute to every piece of telemetry in the scope though (mainly to make the concept & the implementation simpler).

Maybe we now need to list more concrete use cases to find what is better suited. The two I know and would like to solve are already listed in the OTEP:

  • http.app attribute to determine which application handles a request: Would be set by certain HTTP server instrumentations on the Context containing the HTTP sever root span.
  • faas.id, faas.name: In Azure Functions, a function application hosts multiple functions, and something that normally would be a resource attribute should be set on every span from that particular function. An Azure Functions instrumentation would set it on the Context containing the FaaS root span.

In both cases:

  • The only thing that stops the attributes from being resources is that there could be multiple different values in the scope of a resource
  • The attributes could semantically also be set on a local root span (as is the case currently for faas.id/faas.name), but this has certain limitations, see "Alternative: An implementation for traces on the Backend".

Oberon00 avatar Aug 23 '22 08:08 Oberon00

@Oberon00 I agree that baggage API is not suitable for this purpose, but the Context API is exactly the scope that you're targeting. I agree that there are some issues with using Context API (e.g. it allows all key & value data types), they but seem quite workable, and to me it's still much more preferable to use it and keep it small than to add another API surface and the mental complexity that comes with it.

yurishkuro avatar Aug 23 '22 15:08 yurishkuro

I agree that baggage API is not suitable for this purpose, but the Context API is exactly the scope that you're targeting.

The Context API is missing enumeration. That's a deal-breaker, unless you only want to support explicit include-lists, which would be undesirable. This feature also cannot be added to the Context API since we allow language libraries to use their language's built-in context APIs and those might not support enumeration and cannot be changed by us.

Oberon00 avatar Aug 23 '22 15:08 Oberon00

@bogdandrutu Is there any roadmap or relevant open PRs issues for the Baggage API? Maybe in scope of the OpenCensus transition guidance? Would be interesting to see how extensions that would allow Context-scoped attributes could fit in there.

Oberon00 avatar Aug 29 '22 14:08 Oberon00