dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

[BUG]: Error stacktrace and Exception type not being set properly

Open agentCalculator opened this issue 9 months ago • 7 comments

Tracer Version(s)

1.70.0

Go Version(s)

go1.22.0

Bug Report

Image
func recordErrorFromCtx(ctx context.Context, err error, markFailure bool, attrs ...attribute.KeyValue) {
	if err == nil {
		return
	}

	span := trace.SpanFromContext(ctx)

	if markFailure {
		span.SetStatus(codes.Error, err.Error())
	}

	span.RecordError(err, trace.WithStackTrace(true), trace.WithAttributes(attrs...))
}

But it is not adding any attributes to that span and neither am I getting proper stacktrace And by seeing the stacktrace that I am getting it seems like the it is overwriting the actual stacktrace with the otel's internal middlewares and interceptors and I am getting the stacktrace of the last executed function for each and every error

Reproduction Code

No response

Error Logs

No response

Go Env Output

No response

agentCalculator avatar Apr 14 '25 11:04 agentCalculator

Can someone please guide me on this so I can debug this issue further and try to fix it It gets difficult to debug errors and panics without the proper and relevant stacktrace

Thanks!!

agentCalculator avatar Apr 17 '25 04:04 agentCalculator

Hi @agentCalculator ,

Thanks for reaching out. It seems we may have overlooked the RecordError API when adding Opentelemetry drop-in support. I'm currently looking into alternatives and will update you once I have more information.

mtoffl01 avatar Apr 18 '25 14:04 mtoffl01

Hi @agentCalculator ,

Thanks for reaching out. It seems we may have overlooked the RecordError API when adding Opentelemetry drop-in support. I'm currently looking into alternatives and will update you once I have more information.

Hi @mtoffl01

The stacktrace is recorded within the SDK, and we get the stacktrace from the runtime. So it seems like we are getting this meaningless stacktrace because the sdk and the application are not in the same stack

Please let me know what alternatives are you looking into, I am also trying to debug and find some solution regarding this

agentCalculator avatar Apr 18 '25 14:04 agentCalculator

Hey @agentCalculator ,

The stacktrace is recorded within the SDK, and we get the stacktrace from the runtime. So it seems like we are getting this meaningless stacktrace because the sdk and the application are not in the same stack

Just to make sure I'm following, we've identified two problems here, correct?

  1. The stack trace on a [grpc server?] span is not useful nor is it the stacktrace you'd expect
  2. Attempts to overwrite this meaningless stacktrace with a custom one via the RecordException API have no effect

Please confirm. Additionally, clarify the kind of error you expect to see -- I assume from your grpc server.

In the meantime, you can try the following to customize the error on the span, instead of the RecordException API: Pass the span into the EndOptions function along with the tracer.WithError FinishOption, with your error. You can check out the example here - ddotel.EndOptions(child, ddtracer.WithError(err)), where child just refers to the span.

mtoffl01 avatar May 01 '25 20:05 mtoffl01

Yes, both of the points mentioned by you are correct

Error messages are being recorded correctly but the type being assigned is errors.errorString for each

until then will try finish and endoptions functions but still I guess the stacktrace issue will persist as the runtime stack of the SDK and my service are different and stack framer are recorded runtime

agentCalculator avatar May 29 '25 11:05 agentCalculator

@agentCalculator It seems the ddotel package is creating a new error using trace.WithError() inside span.End(), which uses the status description provided by the span.SetStatus with golang std errors package when otel status code is set to error, In your case status description is err.Error(), which explains why error.message is correct and why type is always being assigned to errors.errorString.

https://github.com/DataDog/dd-trace-go/blob/fe9272dcb82745b2fd352f7bb54dd0db4d96d1c1/ddtrace/opentelemetry/span.go#L85-L88

https://github.com/DataDog/dd-trace-go/blob/fe9272dcb82745b2fd352f7bb54dd0db4d96d1c1/ddtrace/tracer/span.go#L424-L432

I have explained the issue in detail here, and have purposed a solution.

kmrgirish avatar Jun 30 '25 07:06 kmrgirish

Considering the context propogation issue between library/packages and application we need to record the runtime stack trace frames in our application itself in order to get the relevant stacktrace of the actual error site instead of middlewares or library stacktrace

For now until the issue is actually fixed we are setting a custom error type, fingerprint and stack from our application itself by re-wrapping the error

Please let us know if anyone have any other good and optimal approach to this

agentCalculator avatar Sep 11 '25 11:09 agentCalculator