opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Allow SpanExporter to write partially completed spans during shutdown

Open jonathanjyi opened this issue 2 years ago • 4 comments

What are you trying to achieve? Instantaneous exporting of Spans/Traces as they occur, such as when a new Event is added to a Span.

What did you expect to see? Exporters outputting Spans/Traces as they seem fit. Not dictated by when the Span or Trace has "ended".

Additional context. I'd like to capture Crashes up to the last moment possible. Currently if a Desktop Application crashes with hundreds of Spans with Events in flight, all that data is lost. By having the Exporter control when to output the Events/Spans/Traces to a persistant file location, then we could capture the events that led to the crash. However, since the Export Handler is only called when a Span is Ended(), this is currently not possible.

An alternative approach is to have a global exception handler to unravel each Span and close them, but that's not possible for all Applications.

I know I could write my own Processor and Exporter to have a cache of Spans in flight at startup, but is there a better way to handle this that I'm not aware of?

jonathanjyi avatar Jul 24 '22 23:07 jonathanjyi

I think this is #373.

Oberon00 avatar Jul 26 '22 12:07 Oberon00

Discussed during triage.

We see two main problems listed here:

  1. Long-running spans (covered in #373), it's hard to find and know if these exist and they can get lost/consume memory.
  2. Crash-behavior of SDK (e.g. I want to know what spans were in flight when crashing).

We think #373 is a proposal solution for both 1&2, however this bug details MOSTLY (2).

We think there's room in the current OTEL specification to solve this problem with modification and look forward to proposals here.

jsuereth avatar Jul 29 '22 15:07 jsuereth

it's hard to find and know if these exist and they can get lost/consume memory

I disagree. They are very easy to find since there is OnStart, which gets a readable & writable reference to the span object and storing that reference is allowed & should reflect updates; so you can also check whether that span has ended (or maintain a list in cooperation with OnEnd). Additional memory consumption from this list might be a concern but I think the set of languages where you have both no weak references and can clean up all resources of the span without an explicit call to End is very small (also, the API spec explicitly allows implementations to leak memory if end is not called).

Does this seem all too convenient? 😃 I have to admit that I was involved in these spec parts and had the use case from #373 in mind when specifying certain details.

But fully agree that this issue should probably be used to discuss the crash behavior (separate/partially related issue) not what I wrote about above.

Oberon00 avatar Jul 29 '22 15:07 Oberon00

The way I see this matter, it's not that we have a problem with any one of the underlying mechanisms -- the span processor, the span exporter, and the span sampler -- it's that to do anything sophisticated (which many users do), you need to delicately combine all three of those abstractions into a single implementation. We have seen how, for example, a rate-limited consistent span sampler needs implementation-level control over the span processor and exporter:

https://github.com/open-telemetry/opentelemetry-java-contrib/pull/352

Moreover, when it comes to sampling, users are all looking for a configurable way to decide which spans do and do not sample, but it is much easier to simply filter spans or drop them in a processor/exporter:

https://github.com/open-telemetry/opentelemetry-go-contrib/pull/2572

It occurs to me that for the Metrics SDK, OpenTelemetry set a user-level objective in its Views specification: users will be able to configure which metrics do and do not report, and how, and at what time granularity, and with which dimensions, and so on -- all of this is configurable with a View. For OpenTelemetry's Trace SDK, what I think we need are not more features for the low-level Span Processor, Exporter, and Sampler (e.g., SamplerProvider: https://github.com/open-telemetry/opentelemetry-specification/pull/2555) -- what I think we need is a Span Views specification and an implementation that does everything at once.

jmacd avatar Jul 29 '22 16:07 jmacd