ApplicationInsights-dotnet icon indicating copy to clipboard operation
ApplicationInsights-dotnet copied to clipboard

Polly based retries result into telemetry for retried requests not getting captured

Open RamjotSingh opened this issue 2 years ago • 7 comments

Microsoft publishes Polly based package which allows developers to configure retries for requests based on certain condition being true.

However, using this package results into retries not being captured in AI because AI thinks of them as duplicate requests and just captures the very first request.

I think this is happening due this following logic in AI package

I am adding a small repro of the issue. In startup.cs if you uncomment the line

AddHttpMessageHandler<TraceParentStrippingHandler>()

You will see 3 entries being recorded, otherwise only one (the very first request).

PollyAICompat.zip

RamjotSingh avatar Mar 13 '22 04:03 RamjotSingh

The zip file is empty can you upload it again?

HenrikSommer-Energinet avatar Apr 21 '22 10:04 HenrikSommer-Energinet

I think this is happening due this following logic in AI package

Yes, this is by design. For now the only workaround is to remove traceparent header for retries, so that ApplicationInsights no longer ignores it. Long term, this will be fixed via OpenTelemetry route, as OpenTelemetry folks are working on a specification on how to report telemetry from Http Retry/Redirects.

cijothomas avatar Apr 22 '22 15:04 cijothomas

@cijothomas If we remove the traceparent header for the retries, will the operation id of the logged entry be different from the operation id of the initial request? Because then we wouldn't be able to group calls by the same operation Id right? We'd have one call with operation id "abc" and then the retries would all have different operation id's?

karun-verghese avatar Apr 25 '22 02:04 karun-verghese

@cijothomas If we remove the traceparent header for the retries, will the operation id of the logged entry be different from the operation id of the initial request? Because then we wouldn't be able to group calls by the same operation Id right? We'd have one call with operation id "abc" and then the retries would all have different operation id's?

I don't think so. The parent's Activity.Id will still used to populate operationid,parentid for the DependencyTelemetry.

cijothomas avatar Apr 25 '22 14:04 cijothomas

In case anyone interested, heres the PR fixing this issue in OpenTelemetry https://github.com/open-telemetry/opentelemetry-dotnet/pull/3072

cijothomas avatar Jun 06 '22 23:06 cijothomas

@cijothomas If I am understanding the fix correctly, it is the same fix I had proposed offline to you right? Remove the headers so the system thinks of it being a unique request.

RamjotSingh avatar Jun 07 '22 03:06 RamjotSingh

Its not really a "fix". There is a spec in OpenTelemetry on how to report spans for retries. and OpenTelemetry is following that spec. Impln. wise - yes it means if there is a traceparent, its "overrridden" with the new traceparent, and every retry attempt creates a new span.

cijothomas avatar Jun 07 '22 04:06 cijothomas

Long term, this will be fixed via OpenTelemetry route

It looks like the change underlying the referenced OpenTelemetry PR (#3072) has been merged. Does that mean we should expected this issue to be resolved "the right way" in the near future (such that we can roll back remove traceparent header the workaround)?

(@cijothomas I didn't fully understand your most recent comment, apologies if my question is redundant.)

mirekkukla avatar Oct 13 '22 19:10 mirekkukla

@mirekkukla No fix for this is planned in this repo. The PR linked is showing how OpenTelemetry is solving this issue, and it'll help Application Insights users only if you are on the OTel Based SDK which is in preview mode now : https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable?tabs=net

If you are using Application Insights' current SDK (the ones shipped from this repo), no fix has been done for this issue.

cijothomas avatar Jan 17 '23 15:01 cijothomas

This issue is stale because it has been open 300 days with no activity. Remove stale label or this will be closed in 7 days. Commenting will instruct the bot to automatically remove the label.

github-actions[bot] avatar Nov 14 '23 00:11 github-actions[bot]