dd-trace-dotnet
dd-trace-dotnet copied to clipboard
Only the first scope is tied to the parent, subsequent requests appear as orphans.
I have noticed that after the first request, subsequent requests will have the web api calls not tied to the parent request. I have verified both the nuget and msi are running the same version 1.28.0. Rebuilding or recycling the application pool (in IIS) will fix the issue but again after the first request it will happen again. The strange behaviour is it works initially and then stops!
The call chain is as follows: .NET Framework Web Service -> Remoting
Creating a new scope at the web service using using (var scope = Tracer.Instance.StartActive(nameof(CancelMerchantLog)))
Creating new scope at the remoting method using using (var scope = Tracer.Instance.StartActive(nameof(LoadMerchantLog)))
In the screen shot you can see the LoadMerchantLog method is tied to the parent scope CancelMerchantLog
but in subsequent requests (same ones) it is not
- Instrumentation mode: Automatic + NuGet
- Tracer version: 1.28.0
- OS: Windows 10
- CLR: .NET Framework 4.7.2
Hello,
I can think of two possibilities, but it's not clear from the screenshots which one it is:
- If the aspnet.request span is tied to the parent but not the LoadMerchantLog span, then it probably means you are creating the LoadMerchantLog span outside of the parent execution context (are you using a dispatcher or something?)
- If the aspnet.request span is not tied to the parent then it means that the distributed headers are lost, though it's not clear why it would happen
Its definately case number #1 - the automatically generated instrumention is always properly linked to the parent function. Its only the custom span tags that seems to work initially, and then stops linking to the parent spans. To get it to work I have to recycle the application pool.
As you can see in the first request aspnet.request has the proper child LoadMerchantLog (custom span). In the second request , the relationship is lost
We have been looking at this issue a bit more deeply and found that on the first call to remoting the trace Id between the webservice and .NET remoting are aligned. However on subsequent calls, the trace id are different and the parent id is missing for the remoting instrumentation.
As an example consider the following scneario when everything aligns properly.
However on subsequent calls, the Trace Id is a new one and the Parent Id is missing.
Hi @TalalTayyab. I see that in your example above, you have different Service/Operation names for the working vs broken cases. Are you finding that this problem is limited to specific services or servers?
I think to get to the bottom of this it would be useful to collect more detailed info about your configuration. Can you please create a support ticket by emailing [email protected]? Please include a link to this github issue so we can tie the problems together. Thanks!
Hi @andrewlock - good pickup. These are the the same method/operations, I had renamed the data dog tags when I took the second screen shot (since then updated). I will send mail to datadog. Thanks for looking into this.
Some other facts -
- The issue persist only with .NET remoting , otherwise normal web server -> web server calls appear fine.
- The issue can be reproduced across different machine / environments.
- ActiveScope is null on the remoting side - and StartActive appears as unrelated
- Tried with version 1.27.0 and 1.28.0 of Datadog NET Tracer library and NuGet package
- The first remoting call appears properly related to the parent. Usually the requests afterwards appear as unrelated
- It also seems to impact the automatically instrumented code (like SQL queries).
- Very rarely, we found the issue would dissapear randomly until we restarted the app pool. So for a given time period all calls would appear as related not just the first. After restarting app pool, the issue would re-appear.
I'm closing this ticket as stale (over one year). I hope your issue was resolved! If not, definitely contact support and try to include a reference to the original support ticket and this github issue. Thanks!