opentelemetry-rust icon indicating copy to clipboard operation
opentelemetry-rust copied to clipboard

Closing and exporting all spans on panic (when panic=abort)?

Open joshtriplett opened this issue 3 years ago • 1 comments

I'm using tracing-opentelemetry. From what I understand, spans rely on Drop implementations to close, and until the span is closed it won't be shipped off. I'm currently using panic=abort for various reasons. I'd love to have a global method that I can call from a panic handler that will close and export all spans before exiting, to make sure enough tracing information has been sent out to debug the panic.

joshtriplett avatar May 08 '21 03:05 joshtriplett

This could be challenging. The Span holds a weak reference to TracerProvider via Tracer. But not the other way around. So it's gonna be hard to try to find all spans and export them.

But other than the spans that in flight. You also need to try to export the ended spans cached in BatchSpanProcessor. So it maybe makes sense to try to force_flush TracerProvider in the panic handler.

TommyCpp avatar May 09 '21 03:05 TommyCpp

Logging active spans on crash, and logging spans that start but never end, are pretty important requirements for a tracing/logging system.

This could be challenging. The Span holds a weak reference to TracerProvider via Tracer. But not the other way around.

This seems like there is a design issue somewhere.

blueforesticarus avatar Mar 17 '23 00:03 blueforesticarus

Logging active spans on crash, and logging spans that start but never end, are pretty important requirements for a tracing/logging system.

I am not able to find any other OpenTelemetry implementation which does this automatically. For eg:, in OpenTelemetry .NET, there is this doc which describes one possible way for users to achieve this, but SDK does not do this automatically. https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace/reporting-exceptions#unhandled-exception Even this suggestion only finishes/exports those Span in that context. Spans in other threads/context remain unstopped and unexported.

cijothomas avatar Mar 17 '23 13:03 cijothomas

Writing things to a file doesn't have the second problem, and if you do things right it doesn't have the first.

I'm using tracing to output to opentelemetry. Tracing has hooks for when span's start, and continues to function in a panic hook. I get 100s of log lines in stdout, but NOTHING ends up sent to jaeger/zipkin via opentelemetry, due to an error 30s into startup.

I don't know whether it is design or implementation, but all the monitoring "solutions" (jaeger, zipkin, opentelemetry, etc) that I have come across do not deal with unfinished spans (which seems to be because the span start doesn't emit anything). This seems to me a basic viability issue (for opentelemetry as a whole).

Not trying to bash the maintainers here (and it seems like a wider issue anyway), just want to point out that

  1. This is a BIG problem
  2. I don't think I'm the only person who will lose a few hours banging their head against a wall thinking "surely there is a way to do this"

blueforesticarus avatar Mar 17 '23 16:03 blueforesticarus

The Span holds a weak reference to TracerProvider via Tracer. But not the other way around.

Conceptually, why is this? Could we do it the other way around?

djc avatar Mar 21 '23 09:03 djc

Related to https://github.com/open-telemetry/opentelemetry-rust/issues/1209

cijothomas avatar Oct 24 '23 16:10 cijothomas