opentelemetry-go-contrib icon indicating copy to clipboard operation
opentelemetry-go-contrib copied to clipboard

Add support to record errors from lambda handlers

Open austindrenski opened this issue 1 year ago • 4 comments

Currently, otellambda doesn't update the instrumentation-provided span based on the result of the wrapped handler.

I was surprised by this, as most of my OTel instrumentation work involves open-telemetry/opentelemetry-dotnet, so this may well be a sensible default for golang, but given the breadth and scale of the OTel projects, I figured this might also just be an oversight, so wanted to open a PR and contribute some code.

This PR adds two config methods:

  1. WithRecordError(bool) which instructs the wrappedHandler to call span.RecordError(...) if wrappedHandler.Handler.Invoke(...) returns an error.
  2. WithSetStatus(bool) which instructs the wrappedHandler to call span.SetStatus(...) if wrappedHandler.Handler.Invoke(...) returns an error.

Off-topic, but in the dotnet OTel libraries, it's also common to provide enrichment delegates which can be used to filter/manipulate the instrumentation-provided span in other ways (e.g. setting custom attributes/tags based on the exception being thrown). I considered adding similar functionality here, but decided to hold off on doing anything more until the code owners have had an opportunity to weigh in.

Background

based on go.opentelemetry.io/contrib/instrumentation/github.com/aws/aws-lambda-go/otellambda v0.46.1

sanitized screencap of Datadog showing a trace's flame graph and error list

sanitized screencap of Datadog showing an APM error graph showing a 0% error rate before and a non-zero error rate after

Sample code based on version 0.0.112 in the above screencap

func main() {
	lambda.Start(otellambda.InstrumentHandler(handle))
}

func handle(ctx context.Context, data []byte) error {
	ctx, span := trace.SpanFromContext(ctx).TracerProvider().Tracer("...").Start(ctx, "...")

	// lots of internal logic
	// ...
	// err := ...

	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		return err
	}

	return nil
}

Sample code based on version 0.0.113 in the above screencap

Note that our handler is no longer creating its own internal span. We can work around this by defining our own wrappingHandler, but the path of least resistance is to just work with the instrumentation-provided span.

func main() {
	lambda.Start(otellambda.InstrumentHandler(handle))
}

func handle(ctx context.Context, data []byte) error {
-	ctx, span := trace.SpanFromContext(ctx).TracerProvider().Tracer("...").Start(ctx, "...")
+	span := trace.SpanFromContext(ctx)

	// lots of internal logic
	// ...
	// err := ...

	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		return err
	}

	return nil
}

Sample code based on this patch

Now we can create an internal span in our handler and choose whether errors recorded in this layer should also be passed back up to the instrumentation-provided span to be recorded there as well.

(Example here might be overly simplified, but underlying all of the required obfuscation there is a real-world business case for this. I can provide more details upon request if required.)

func main() {
-	lambda.Start(otellambda.InstrumentHandler(handle))
+	lambda.Start(otellambda.InstrumentHandler(handle, otellambda.WithRecordError(true), otellambda.WithSetStatus(true)))
}

func handle(ctx context.Context, data []byte) error {
-	span := trace.SpanFromContext(ctx)
+	ctx, span := trace.SpanFromContext(ctx).TracerProvider().Tracer("...").Start(ctx, "...")

	// lots of internal logic
	// ...
	// err := ...

	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		return err
	}

	return nil
}

austindrenski avatar Dec 06 '23 14:12 austindrenski

Why would someone explicitly not want to record errors? (do we need the option? How about recording the error all the time?)

dmathieu avatar Dec 06 '23 15:12 dmathieu

Why would someone explicitly not want to record errors? (do we need the option? How about recording the error all the time?)

Great question. I think there are two parts to the answer:

  1. IMO we could elide WithSetStatus(...) and take the stance that otellambda will always call span.SetStatus(codes.Error, err.Error()) if the wrapped handler returns an error.

    That's actually what I wrote initially, treating it was an outright bug to not propagate the error status to the parent span (it certainly caused me a headache in production!), but I eventually pulled back and hid it behind a config option to avoid breaking any legitimate workflows (doubtful as I am that they even exist).

    That said, I'm more than happy to elide WitSetStatus(...) if everyone's amenable. Would indeed offer a better happy path out-of-the-box.

  2. However, it's a slightly different story for WithRecordError(...). At least in the dotnet world, there's an effort to avoid adding exception events to a span without user opt-in.

    There are better explanations out there, but IIRC they mostly boil down to (A) more span events == more data to export, (B) recording an exception == serialization == allocations.

    (Should note that dotnet exceptions by default include the message + the stack trace, so recording an exception as a span event can have a significant impact on span size. This might matter less in golang where err.Error() seems to (often? always?) return just the error message.)

austindrenski avatar Dec 06 '23 16:12 austindrenski

Does this comment applies to this PR ?

How about simply adding span.SetStatus(codes.Error, err.Error()) when an error is returned?

I think this is also inline with https://github.com/open-telemetry/opentelemetry-go-contrib/pull/4677#issuecomment-1843250118.

pellared avatar Dec 07 '23 20:12 pellared

Does this comment applies to this PR ?

I've had some real-world use cases where we wanted to record that an error occurred without setting the span status to error (or returning an error for that matter).

If you'll allow a super contrived example, imagine you're (for whatever reason) hand-rolling HTTP backoff-and-retry code wherein you may want to record the individual errors that occurred caused each of the eventual retries.

I think the value proposition is clearer in languages where the SDK opts to include stack traces by default, but for golang, it looks like we have to explicitly opt-in each time:

span.RecordError(err, trace.WithStackTrace(true))

which now makes me think we should just expose this at the same time, e.g.,

func WithRecordError(recordError bool, recordStackTrace bool) Option {
	return optionFunc(func(c *config) {
		c.RecordError = recordError
		c.RecordStackTrace = recordStackTrace
	})
}
	response, err := h.handler.Invoke(ctx, payload)
	if err != nil {
		if h.instrumentor.configuration.RecordError {
-			span.RecordError(err)
+			span.RecordError(err, span.WithStackTrace(h.instrumentation.RecordStackTrace))
		}
		if h.instrumentor.configuration.SetError {
			span.SetStatus(codes.Error, err.Error())
		}
		return nil, err
	}

How about simply adding span.SetStatus(codes.Error, err.Error()) when an error is returned?

I think this is also inline with #4677 (comment).

I'm definitely in favor of just calling span.SetStatus(codes.Error, err.Error()) by default. Since no one else has chimed in opposed to it so far, I'll go ahead and update the PR to reflect that change.

austindrenski avatar Dec 08 '23 00:12 austindrenski