opentelemetry-dotnet icon indicating copy to clipboard operation
opentelemetry-dotnet copied to clipboard

(AspNet.TelemetryHttpModule) Activity.Current not restored before all IIS LifeCycle events

Open chris-smith-zocdoc opened this issue 3 years ago • 5 comments

Bug Report

Activity.Current lost in some IIS lifecycle events

We have a custom HttpApplication with handlers for most IIS lifecycle events. These events can start child spans (mostly sql) that should be attached to the current request trace. For a percentage of the IIS lifecycle events, we are seeing traces that don't appear to be properly restoring Activity.Current, causing child spans to start their own trace.

OpenTelemetry = 1.2.0
OpenTelemetry.Api = 1.2.0
OpenTelemetry.Exporter.Zipkin = 1.2.0
OpenTelemetry.Instrumentation.AspNet = 1.0.0-rc9.2
OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule = 1.0.0-rc9.2
OpenTelemetry.Instrumentation.SqlClient = 1.0.0-rc9.2

Runtime version net48

Symptom

I believe Activity.Current is null

What is the expected behavior?

Activities started in IIS lifecycle methods should be attached to the request trace to which they belong

What is the actual behavior?

The Activities are orphaned, creating single span traces.

Additional Context

We aren't seeing this in all lifecycle events, only some of them and only a certain percentage of the time.

Here is the approximate percentage of time we're seeing this occur by lifecycle method. It happens most frequently in PreSendRequestHeaders (nearly every time)

Unaffected

IIS Life Cycle Method   % parent span exists
PostAuthenticateRequest	      1
PostMapRequestHandler	      1
OnAuthenticate	              1
BeginRequest	              1
RequestHandlerExecute	      0.99999 (assuming this is probably not a real issue)

Affected by the bug

PreRequestHandlerExecute	0.98452
PostRequestHandlerExecute	0.98104
PostLogRequest	                0.83814
PreSendRequestHeaders	        0.0085

How are we tracking this? We have a custom span processor that adds the current lifecycle method to each span by reading it from HttpContext.Items

chris-smith-zocdoc avatar May 17 '22 21:05 chris-smith-zocdoc

I believe this could be resolved by changing this line https://github.com/open-telemetry/opentelemetry-dotnet/blob/55c5dd18e9bcb8f9f728aa1a0557bd5157c88362/src/OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule/TelemetryHttpModule.cs#L88-L91

to

if (!context.IsPostNotification)
{
    ActivityHelper.RestoreContextIfNeeded(context.ApplicationInstance.Context);
}

chris-smith-zocdoc avatar May 17 '22 22:05 chris-smith-zocdoc

@chris-smith-zocdoc

So that rule is just something we carried over from the original version here: https://github.com/aspnet/Microsoft.AspNet.TelemetryCorrelation/blob/50f0acb7a0cb164d834eb91b839959a841577302/src/Microsoft.AspNet.TelemetryCorrelation/TelemetryCorrelationHttpModule.cs#L84

I don't think anyone is opposed to changing it.

Could you possibly share a small repo that reproduces the problem so I can take it for a spin?

CodeBlanch avatar Jun 15 '22 17:06 CodeBlanch

Thanks @CodeBlanch I've setup a minimum reproduction here https://github.com/chris-smith-zocdoc/OtelSample

You should be able to open the solution and hit F5 to run under IIS Express to see this easily.

Set a breakpoint on https://github.com/chris-smith-zocdoc/OtelSample/blob/master/Global.asax.cs#L39

chris-smith-zocdoc avatar Jun 17 '22 00:06 chris-smith-zocdoc

@chris-smith-zocdoc Thanks for that code, very helpful! I took it for a spin, here's my findings...

It is a bit more complicated than simply removing context.CurrentNotification == RequestNotification.ExecuteRequestHandler.

The order of events firing in OnExecuteRequestStep looks like this:

  • BeginRequest (Activity started)
  • AuthenticateRequest
  • AuthorizeRequest
  • ResolveRequestCache
  • MapRequestHandler
  • AcquireRequestState
  • PreExecuteRequestHandler
  • ExecuteRequestHandler
  • ReleaseRequestState
  • UpdateRequestCache
  • LogRequest
  • EndRequest (Activity is stopped)
  • SendResponse (Application_PreSendRequestHeaders/PreSendRequestContent called at this point)

That last line is the important one. Even if we call the restore code, the Activity has been stopped so there is nothing to restore 🤣

We could move the stop to the SendResponse event. I need to test this IIS URL Rewrite logic in there to make sure that works. BUT, let me ask first, @cijothomas @alanwest @reyang how do we feel about changing how this instrumentation works so that some of the time spent sending the response starts to be included in the span duration?

CodeBlanch avatar Jun 18 '22 04:06 CodeBlanch

@CodeBlanch that may be true for PreSendRequestHeaders but not the other events which were showing smaller amounts of loss (~2% to 16%)

PreRequestHandlerExecute, PostRequestHandlerExecute, PostLogRequest	  

So I think there are two decisions here:

  • can we restore before all events? I'd be happy to send a pr for this
  • should we move the Activity stop call

chris-smith-zocdoc avatar Aug 16 '22 15:08 chris-smith-zocdoc

@CodeBlanch I opened https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/761 to restore the context before all events

chris-smith-zocdoc avatar Nov 08 '22 15:11 chris-smith-zocdoc