opentelemetry-specification Record span-ending exceptions as span attributes instead of span event or log

Somewhat continuing from https://github.com/open-telemetry/opentelemetry-specification/pull/4333#discussion_r1946331112

My understanding is that:

The plan is to get rid of (or at least deprecate, which is close) span events and this is unlikely to change.
Since exceptions on spans are recorded as span events, the current plan for that is to instead record them as logs that are children of the span.

My proposal is that when a span ends with an exception, the usual attributes exception.type/message/stacktrace should be recorded directly on the span instead of creating a child log. Ways this could look:

Allow passing an exception object to Span.End
Change Span.RecordException to record attributes on the span instead of creating an event. This assumes that Span.RecordException is only called when the span is ending, which IIUC is now the recommendation.

Reasons to use exception.* span attributes instead of a child log:

It's not clear to me how Span.RecordException will work in the SDK if it has to create a log. Will creating a tracer provider require passing a logging provider?
Backends won't need to support logs to support working with exceptions on spans.
Users won't have to join multiple records in a query to e.g. find spans which match some predicate and had an exception.
Very simple to implement, document, and understand.
When a single exception bubbles through multiple nested spans, by default each one would create a child log and would have no way of knowing that this would be redundant. This results in the tree of spans and logs looking very ugly:

span1:
    span2:
        span3:
            exception-log3
        exception-log2
    exception-log1

https://opentelemetry.io/docs/specs/semconv/general/recording-errors/#recording-exceptions recommends recording some information directly on the span, but:
1. It doesn't recommend recording exceptions that are handled, especially because the exception message is meant to be recorded as the status description, which requires setting the status code to error. However it's useful to record exceptions that ended the span even if they're expected to be handled and thus the span should not be marked as an error. This is even acknowledged in the proposal to record exceptions as logs: https://github.com/open-telemetry/opentelemetry-specification/blob/db2b6477da32ee28e00dd827c2f94d34f28967fa/oteps/4333-recording-exceptions-on-logs.md?plain=1#L155-L159
2. error.type and the status description are very generic places to put information, so in some situations they would be better used by something other than the exception. For example, an HTTP client for a specific service might raise a generic APIError which is recorded as exception.type on the span, while the more specific error code contained within could be used for error.type. In general, error.type and the status description shouldn't be overridden when the already present, but that shouldn't mean that the precise technical exception data gets discarded.
3. There's currently no way to know if error.type represents the type of an exception that ended a span, or something else. Same with the span status description.
4. This isn't clear to users. For example, a user may see the error.type attribute on a span and expect to find an attribute like error.message next to it. They have no way of knowing if the exception message is anywhere to be found in the span data if they don't know where to look.
There's currently no apparent way to know whether or not a span ended with an exception. As mentioned above, error.type and the span status are ambiguous. The presence of a child log with an exception (which is nontrivial to check) isn't conclusive either, because the exception could have been handled and the span ended normally. exception.* attributes can solve this.

Responses to possible objections:

What about exceptions that are handled within the span and don't end it?
- I think the plan to use logs in that situation is fine. I see it as different because:
  - If users want to log such exceptions then the implication is that they're interesting events in their own right and would likely be worth logging even without the parent span, and OTel doesn't seem to have a good alternative recommendation for that. A span-ending exception is much more like a property of the span itself.
  - If there are 1000 such exceptions, putting them all inside the span somehow is likely to be a problem.
  - The timestamp of a handled exception is potentially much more interesting, since it's not necessarily at the end of the span.
What about exceptions that aren't errors, i.e. are expected to be handled outside the span? Logs can have a severity to indicate this.
- An attribute like exception.severity can deal with this.
Stacktraces can be very big.
- If the size of a stacktrace is problematic in a span attribute, it will probably be problematic in a log too.
Logs can be configured to e.g. not record certain exceptions.
- Generically reusable wrapper tracer providers can be created to make this kind of configuration easy.

Feb 24 '25 11:02 alexmojaki

Logs can be configured to e.g. not record certain exceptions.

This does not look to be currently possible as there is no way to alter a span event or create a new span inside a span processor.

Feb 24 '25 12:02 pellared

This does not look to be currently possible as there is no way to alter a span event or create a new span inside a span processor.

I don't see any reason that the spec needs to keep this limitation. In any case a wrapper tracer or tracer provider that's spec compliant should still be possible.

Feb 24 '25 12:02 alexmojaki

I don't see any reason that the spec needs to keep this limitation.

The fact that you do not see a reason does not mean that there are no reasons. Please create an issue if you want to validate your proposal. As of now the description has a false statement which should be updated. You can always refer to the created issue in this issue description.

In any case a wrapper tracer or tracer provider that's spec compliant should still be possible.

You can do anything wrapping the API implementations. The fact that something is possible does not mean that it should be done.

Feb 24 '25 13:02 pellared

I've removed the mention of span processors so that there is no false statement.

The fact that something is possible does not mean that it should be done.

This isn't an argument.

Feb 24 '25 13:02 alexmojaki

I've removed the mention of span processors so that there is no false statement.

Could you please rephrase and leave it as a "drawback"? Providing only benefits on proposed solution is not a fair and accurate design description. Please also describe other drawbacks.

In any case a wrapper tracer or tracer provider that's spec compliant should still be possible.

The fact that something is possible does not mean that it should be done.

This isn't an argument.

Fair. Let me expand on this. If you add such things on the API layer then you are not able to have fine-grained control on the processor level. For instance, a user may want to export spans to two destination and only one of it should have attributes/span events redacted. It would be also less performant than doing it on lower (SDK) level. For reference, I was also looking as wrapping logger and logger provider to support multi log processing pipelines: https://github.com/open-telemetry/opentelemetry-go/pull/5830 and it was quite quickly rejected as a bad design.

Feb 24 '25 13:02 pellared

It would be also less performant than doing it on lower (SDK) level.

I don't know if this is true. A provider can intercept Span.RecordException (while adding trivial overhead elsewhere) and prevent the stacktrace from being computed in the first place, which can be expensive. So it could be significantly more efficient than a log processor working on existing logs.

Feb 24 '25 14:02 alexmojaki

What about exceptions that are handled within the span and don't end it?

I think the plan to use logs in that situation is fine. I see it as different because:

Having two places ways to record errors/exceptions depending on how they affect the span does not seem like a design which is

Very simple to implement, document, and understand.

In most cases, the span ended because of an exception/error already has Error status and a description containing the exception/error details. The information added to attributes (or span events) would be redundant to the information already provided in the span description (and logs if exported).

Still, we can have a semantic convention for distinguishing these kinds of exceptions/errors then we could provide a LogRecordProcessor which adds the exception attributes to the span when an exception/error ends the span.

Feb 24 '25 14:02 pellared

It's not clear to me how Span.RecordException will work in the SDK if it has to create a log. Will creating a tracer provider require passing a logging provider?

Where is the recommendation that Span.RecordException has to create a log? (Of course, if there were such a specification, then I can see it'll challenging to implement for the exact reason you raised - how does Tracer obtain a LoggerProvider!)

Feb 24 '25 16:02 cijothomas

In most cases, the span ended because of an exception/error already has Error status and a description containing the exception/error details. The information added to attributes (or span events) would be redundant to the information already provided in the span description

Is there a spec for the format of the data in the message or the exception/error? The ask here is for more structured and formal inclusion of this information in the span itself. I agree it would be nice if the information is not duplicated but ultimately it serves somewhat different purposes (I think of the message as something nice to display while the attributes are more structured and intended for querying). IMO it should also be possible to include the stack trace of the exception as seen by the span at least as an opt-in configuration: the stack trace changes as the exception bubbles up through the program so if users are willing to accept the cost it can be worth it to record more than once; it's not strictly duplication.

Feb 24 '25 16:02 adriangb

What about exceptions that are handled within the span and don't end it?

I think the plan to use logs in that situation is fine. I see it as different because:

Having two places ways to record errors/exceptions depending on how they affect the span does not seem like a design which is

Very simple to implement, document, and understand.

There is no reason for handled errors/exceptions to get any kind of special treatment in relation to the outer span. In this code:

with tracer.start_as_current_span("outer span"):
    with tracer.start_as_current_span("inner span"):
        ...
    print("...")
    log.info("...")
    log.error("...")
    log.exception("...")

we would not consider storing the inner span, the printed message, or the info log directly on the span. The error log just has a different level from the info log. The exception log just has more info than the error log. It's good to have specifications and semantic conventions about things like the names of the exception attributes in the log. But none of these things should affect the outer span, nor should they be affected by the outer span except to set the parent trace/span ID. They are independent signals. The question "what about handled exceptions in a span" doesn't really make sense. Why should the span care?

Now suppose an unhandled exception is raised. The outer span sees it and has an opportunity to record it somewhere, and it should. Should it record it by emitting a log? Why? Just because users have the option to manually catch and log handled exceptions? How is that relevant to this decision?

In most cases, the span ended because of an exception/error already has Error status and a description containing the exception/error details. The information added to attributes (or span events) would be redundant to the information already provided in the span description (and logs if exported).

I have explained why the error status and description are insufficient. If it often turns out to be redundant, I don't think that's a significant problem, it's not that much data.

Still, we can have a semantic convention for distinguishing these kinds of exceptions/errors then we could provide a LogRecordProcessor which adds the exception attributes to the span when an exception/error ends the span.

What would be the benefit of doing things this way? Why should the tracing SDK automatically emit a log when a span ends with an exception? Again, this is a very different and separate thing from users or instrumentations explicitly logging specific exceptions.

Feb 25 '25 12:02 alexmojaki

It's not clear to me how Span.RecordException will work in the SDK if it has to create a log. Will creating a tracer provider require passing a logging provider?

Where is the recommendation that Span.RecordException has to create a log?

In https://github.com/open-telemetry/opentelemetry-specification/pull/4430 it looks like the plan is to allow opting in to such:

https://github.com/open-telemetry/opentelemetry-specification/blob/8a276ac4bb0a43a0b2ea62e8e6ce90910557bb2b/oteps/4430-span-event-deprecation-plan.md?plain=1#L67-L68

and that the recommendation will be to create a log instead of calling Span.RecordException:

https://github.com/open-telemetry/opentelemetry-specification/blob/8a276ac4bb0a43a0b2ea62e8e6ce90910557bb2b/oteps/4430-span-event-deprecation-plan.md?plain=1#L52-L55

I guess that answers point 1. So in this part of my proposal:

Ways this could look:

Allow passing an exception object to Span.End

Change Span.RecordException to record attributes on the span instead of creating an event. This assumes that Span.RecordException is only called when the span is ending, which IIUC is now the recommendation.

I now suggest forgetting the second option. I'm proposing allowing Span.End(exception). Then the plan to deprecate Span.RecordException can continue, and if using it emits a warning, that warning should recommend using logs for handled exceptions and Span.End(exception) for unhandled ones. Using it is already not recommended for handled exceptions:

https://github.com/open-telemetry/opentelemetry-specification/blob/d5fed63db471bbc5576c3531061f85fd29b3ebbc/specification/trace/exceptions.md#L16-L17

@pellared please take note of that: there's already a difference in the recommendations for handled vs unhandled exceptions.

Feb 25 '25 12:02 alexmojaki

I'm supportive of recording the (terminal) exception on the span attributes and introducing Span.End(Exception | Error):

very clean API story - you could only set one exception and it's unhandled when the span ends.
it's effectively a convenience on top of End(Status.Error, ex.getMessage()) (with more features).
it hides complexity away from users/instrumentations and helps everyone report exceptions consistently.
it allows to evolve the implementation details (config options, details captured) inside OTel SDKs vs every instrumentation out there
most importantly it allows users to see "why this span failed" right on that span - this is better than span events - some backends store them separately from spans

I have some minor/resolvable (I believe) concerns:

should we record stack trace on span at all?
- Not by default, but opt-in sounds ok
should instrumentations report exception on the span and exception on the log?
- I believe yes, combined with #4333 we get the following experience:
  - by default:
    - exception message and type are duplicated (when log is recorded)
    - stacktrace appears on the log only when severity is high and does not appear on spans
  - users can customize: choose the threshold for stacktrace on logs or can choose to report them on spans regardless of severity
can we reuse existing attributes/properties:
- error.type for exception.type - no, they are somewhat different. You can have error.type = VersionNegotiationError and exception.type = HttpRequestException or error.type = 404 and no exception.type
- span status description for exception.message - yes and I think we should do it.

Feb 25 '25 15:02 lmolkova

I think I could live with the exception info being written directly as attributes to the span. I do have some concerns about the API becoming very opinionated as to the timing of when exceptions can be reported and how that might impact instrumentation.

I don't know how much the specification needs to explicitly allow for this but I think the instrumentation should remain configurable enough that users can choose how exceptions are handled in general. That would include where they're reported (log events, span events, span attributes, etc.) as well as how they're rendered into attributes. Like I think maybe there should be a recommendation that where reasonable the underlying error is recorded directly onto the span/event/log/whatever so that a processor/exporter how they should be emitted, especially the stacktrace. This is what I'm doing with the Java OTel SDK today, I embed the raw Throwable instance into the Event which flows to a custom SpanExporter that custom renders the attributes. I very much don't want the SDK to dump the entire raw stacktrace into an attribute regardless of where it ultimately ends up.

Feb 25 '25 16:02 HaloFour

span status description for exception.message - yes and I think we should do it.

@lmolkova please see point 6 in the issue description. In particular, we should be able to record exceptions even if the span status is set to OK.

Feb 25 '25 17:02 alexmojaki

In particular, we should be able to record exceptions even if the span status is set to OK.

Could you share a real-life example when span status is not error, but you'd attach a (single) exception to a span?

You should be able to record attributes directly if that's what you want to do in your code, but it seems controversial and confusing feature to provide on the API surface (end span with exception and OK/UNSET status).

Feb 25 '25 17:02 lmolkova

I'm supportive of recording the (terminal) exception on the span attributes and introducing Span.End(Exception | Error)

+1

Feb 26 '25 03:02 trask

Could you share a real-life example when span status is not error, but you'd attach a (single) exception to a span?

Some libraries return errors/exceptions for flow control that are not regular errors. E.g. maybe this NotFoundException and https://github.com/spring-projects/spring-security/blob/ec3cc66b647d35365c2f165c263f83ba3d27f063/acl/src/main/java/org/springframework/security/acls/afterinvocation/AbstractAclProvider.java#L71-L86? It may be a bad example, I tried to find something quickly.

we should be able to record exceptions even if the span status is set to OK.

I think that this cases are not errors/exceptions in OTel semantics. Therefore, in such cases instrumentation should not add exception.* or error.*. attributes.

You should be able to record attributes directly if that's what you want to do in your code, but it seems controversial and confusing feature to provide on the API surface (end span with exception and OK/UNSET status).

I agree. I do not think it deserves a dedicated API. Moreover, returning exceptions for flow control is commonly considered as an anti-pattern.

Feb 26 '25 08:02 pellared

Could you share a real-life example when span status is not error, but you'd attach a (single) exception to a span?

Some libraries return errors/exceptions for flow control that are not regular errors. E.g. maybe this NotFoundException and https://github.com/spring-projects/spring-security/blob/ec3cc66b647d35365c2f165c263f83ba3d27f063/acl/src/main/java/org/springframework/security/acls/afterinvocation/AbstractAclProvider.java#L71-L86? It may be a bad example, I tried to find something quickly.

No, it's a good example. Same reasoning as here:

https://github.com/open-telemetry/opentelemetry-specification/blob/db2b6477da32ee28e00dd827c2f94d34f28967fa/oteps/4333-recording-exceptions-on-logs.md?plain=1#L155-L159

We actually do this in our SDK for HTTPException in starlette/fastapi: https://github.com/pydantic/logfire/blob/aa09b4b0da9b768d5885f5d98dcd0fd7c0225dbc/logfire/_internal/tracer.py#L358-L367

In Logfire, spans have a log level, equivalent to the severity of OTel logs. Spans with an exception default to level error, as do spans with an error status code. OTel logs go to the same database table and just use their actual severity. Users can then query level = 'error' as a general way to find any kind of error, and the spans/logs are displayed with a red icon.

It was annoying (to both us and customers) that server 4xx errors matched level = 'error' and showed as red, making it harder to find 'real' errors. So now the SDK sets the level of such spans to warning and they show as yellow.

For users of other backends without level/severity in spans, searching for spans with status=ERROR is the only equivalent workflow I know of, and I expect the same problem applies. So to accommodate other backends, our SDK also doesn't set the span status, even though status=ERROR and level=warn should be allowed and our backend users are expected to prefer searching by level over span status.

So options I see to allow doing similar things in a standard way are:

Add a severity to spans, with severity=warn and status=ERROR as a possible combination
Allow a status description without setting the status to ERROR
Allow Span.End(exception, SetStatus=False) while having SetStatus=True as the default.
Set span status to OK on such spans so that it can't be changed to ERROR. I agree this is icky, such a span isn't really OK. But it's the only option I see that doesn't require additional changes to the standard.
Add an exception.severity semantic convention, and don't set it by default, i.e. it must always be less than error if set. The idea now is that searching for status=ERROR includes all exceptions, but status=ERROR and exception.severity != null is the way to find 'actual' errors. This requires extra knowledge and effort from users but it can work.

Feb 26 '25 09:02 alexmojaki

we should be able to record exceptions even if the span status is set to OK.

I think that this cases are not errors/exceptions in OTel semantics. Therefore, in such cases instrumentation should not add exception.* or error.*. attributes.

They may not be errors, but they are exceptions. It's useful to record exceptions to know what happened when debugging even if they don't represent errors.

Our first major customer had a problem that their endpoint was returning 422 and they didn't know why. Their code was raising an HTTPException somewhere but they weren't sure where. The standard Python FastAPI instrumentation uses middleware that can't see those exceptions because FastAPI has already handled them before that. So we extended that instrumentation in our SDK to add an extra span which can catch exceptions and record a stacktrace before FastAPI can handle them.

Logging the exception instead could also work in this case, although a span has the slight benefit that it also records the duration of the actual endpoint function separate from middleware and dependencies. And even if it used a log instead of a span, it would still make sense to use a severity of Warn, so that exception wouldn't be an error in OTel semantics.

Feb 26 '25 10:02 alexmojaki

Logging the exception instead could also work in this case, although a span has the slight benefit that it also records the duration of the actual endpoint function separate from middleware and dependencies. And even if it used a log instead of a span, it would still make sense to use a severity of Warn, so that exception wouldn't be an error in OTel semantics.

I meant just emitting a log record instead of adding span events or attributes. We can still have the span. Thanks to trace context correlation it is possible to have both the duration and the exception correlated with this span.

The more I think about it the more problems/confusions I see. I also want to confess that I was wrong here:

I think that this cases are not errors/exceptions in OTel semantics. Therefore, in such cases instrumentation should not add exception.* or error.*. attributes.

An exception does not need to be an error. Reference: https://opentelemetry.io/docs/specs/semconv/general/recording-errors/#recording-exceptions.
exception.* attributes seem to be allowed only in span events and logs. Reference: https://opentelemetry.io/docs/specs/semconv/exceptions

I'm supportive of recording the (terminal) exception on the span attributes

Me too

and introducing Span.End(Exception | Error)

However, I am not convinced that this a reason for changing the Trace API.

Adding something like Span.End(Exception | Error) does not seem to support the case when an exception is thrown which is not an error.
Moreover, this would couple the Trace API to Semantic Conventions which I do not like and find as an anti-pattern.

I would rather propose, adding a functionality in "semconv libraries" to create exception attributes from an exception. This should be more loosely coupled and flexible. For example in Go, we could add a func ExceptionAttibutes(err error) []attribute.KeyValue to https://pkg.go.dev/go.opentelemetry.io/otel/semconv/v1.30.0. It could even reuse https://pkg.go.dev/go.opentelemetry.io/otel/semconv/v1.30.0#ExceptionMessage and https://pkg.go.dev/go.opentelemetry.io/otel/semconv/v1.30.0#ExceptionType.

Feb 26 '25 11:02 pellared

Logging the exception instead could also work in this case, although a span has the slight benefit that it also records the duration of the actual endpoint function separate from middleware and dependencies. And even if it used a log instead of a span, it would still make sense to use a severity of Warn, so that exception wouldn't be an error in OTel semantics.

I meant just emitting a log record instead of adding span events or attributes. We can still have the span. Thanks to trace context correlation it is possible to have both the duration and the exception correlated with this span.

Why emit a log when the information can just be set on the span?

Adding something like Span.End(Exception | Error) does not seem to support the case when an exception is thrown which is not an error.

I don't know what the Error in Span.End(Exception | Error) means.

What do you mean by it not supporting the non-error case? Do you mean that it sets the span status to error when that isn't desired? Would you like something that allows the equivalent of Span.End(exception, SetStatus=False) without adding that much complexity to Span.End?

Moreover, this would couple the Trace API to Semantic Conventions which I do not like and find as an anti-pattern.

Why don't you like it? This coupling already exists, RecordException is in the API and uses semantic conventions. #4333 suggests an SDK method for setting exception attributes on logs. If there were dedicated exception fields in the data model / protobuf definition would that change things?

I would rather propose, adding a functionality in "semconv libraries" to create exception attributes from an exception.

Such a function could be useful but users shouldn't usually need it.

https://opentelemetry.io/docs/specs/semconv/general/recording-errors/#recording-exceptions recommends the following boilerplate:

  Span span = startSpan();
  try {
    ...
  } catch (IOException e) {
    span.recordException(e);
    span.setAttribute(AttributeKey.stringKey("error.type"), e.getClass().getCanonicalName())
    span.setStatus(StatusCode.ERROR, e.getMessage());
    throw e;
  }

I find this painful to look at:

The span isn't ended anywhere
Other types of exceptions aren't worth recording?
There's no context propagation
So much completely standard boilerplate that should apparently just be copy-pasted elsewhere?

In Python there are THREE context managers which automatically take care of recording exceptions, setting the status to error, ensuring the span ends, and (except in the first case) propagates context: with span, with tracer.start_as_current_span, and with trace.use_span(span). In Logfire we add with logfire.span on top of that.

SDKs should add conveniences like this as much as the language allows so that users can easily follow best practices without copy-pasting or needing to remember things. Recording exceptions should be the default. Manually setting attributes should only be needed in unusual circumstances.

Feb 26 '25 14:02 alexmojaki

@alexmojaki I see your point on exception + OK/UNSET.

So then Span.RecordException(ex) seems to give the right flexibility. It'd be best to rename it to SetException to highlight it's a single one. It'd also be a breaking change in the RecordException behavior, so we'd need to deprecate it and add a new method anyway.

@pellared Span.RecordException(ex) already couples trace api with semconv - https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/exceptions.md.

Feb 26 '25 14:02 lmolkova

@lmolkova this sounds good. It sounds like you're saying that SetException (much like RecordException) wouldn't set the status code at all, that would be the job of higher-level conveniences like the context managers in Python.

Feb 26 '25 14:02 alexmojaki

should we record stack trace on span at all?

Not by default, but opt-in sounds ok

@lmolkova I find this a very surprising default, can you explain? https://opentelemetry.io/docs/specs/otel/trace/exceptions/ says that the attribute SHOULD be filled. https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/ says it's recommended. The Python SDK doesn't give users an option to not record it.

should instrumentations report exception on the span and exception on the log?

I believe yes

The only reason I can see to do this is if you want to put the stacktrace in a log so that you can avoid putting it in the span. Are there other reasons? And if that's the reason, why is that helpful?

Feb 26 '25 15:02 alexmojaki

@pellared Span.RecordException(ex) already couples trace api with semconv

~~@lmolkova, I am aware and I do not like this design. In my opinion, API should not be coupled to to Semantic Conventions. AFAIK Span.RecordExcception is the only place where API is coupled to the Semantic Conventions.~~

EDIT: Actually this is the SDK which is coupled to semconv which is not that bad 😅

Feb 26 '25 16:02 pellared

Related OTel Go "hanging" PR:

https://github.com/open-telemetry/opentelemetry-go/pull/5762

I think it is safe to start with Semantic Conventions to allow using exception.* attributes directly on the span and consider enhancing the Trace API as a follow-up. I am cautious with proposals about adding methods to API interfaces/abstractions as these can be seen as breaking changes in many languages.

Feb 26 '25 23:02 pellared

@trask, quoting you in https://github.com/open-telemetry/opentelemetry-specification/pull/4430:

However, some use cases are now being prioritized based on specific backend systems and query languages, which deviates from the original vision.

Then let me focus for a moment purely on the SDK/API side.

With this in mind, the Log SIG discussed and feel that it would be most consistent to emit all exceptions as Logs.

Then I think it would also be reasonable to aim to consistently emit all significant exceptions at least somewhere. Missing exceptions would be worse, right?

Now, quoting myself above:

https://opentelemetry.io/docs/specs/semconv/general/recording-errors/#recording-exceptions recommends the following boilerplate:

  Span span = startSpan();
  try {
    ...
  } catch (IOException e) {
    span.recordException(e);
    span.setAttribute(AttributeKey.stringKey("error.type"), e.getClass().getCanonicalName())
    span.setStatus(StatusCode.ERROR, e.getMessage());
    throw e;
  }

I find this painful to look at:

The span isn't ended anywhere

Other types of exceptions aren't worth recording?

There's no context propagation

So much completely standard boilerplate that should apparently just be copy-pasted elsewhere?

In Python there are THREE context managers which automatically take care of recording exceptions, setting the status to error, ensuring the span ends, and (except in the first case) propagates context: with span, with tracer.start_as_current_span, and with trace.use_span(span). In Logfire we add with logfire.span on top of that.

SDKs should add conveniences like this as much as the language allows so that users can easily follow best practices without copy-pasting or needing to remember things. Recording exceptions should be the default.

Is Python the exception here? Do other language SDKs not have these conveniences? Does Java not have any similar language constructs that would allow something like this? Does everyone remember to manually record exceptions all the time?

And for those languages that can and do have these conveniences, how would they work? Will TracerProviders receive a LoggerProvider? Will they need to use deprecated span events?

Mar 18 '25 21:03 alexmojaki

It's not even just about SDK conveniences. What happens to all the instrumentations that only create spans (because they've never needed to record anything that doesn't have a duration, including handled exceptions) that want to continue recording unhandled exceptions in those spans? Do they all require logging to be configured to not do something deprecated?

Mar 19 '25 13:03 alexmojaki

Importantly, does that mean that projects that aren't using OTel logging are simply not going to be able to observe exceptions?

Mar 19 '25 13:03 HaloFour

does that mean that projects that aren't using OTel logging are simply not going to be able to observe exceptions?

this is not true

projects that aren't sending OTLP logs anywhere will be able to configure their SDK to attach exceptions (and other events) to spans (see #4430)

Mar 19 '25 14:03 trask