newrelic-dotnet-agent Unexpected thread usage increase

Description Something is causing web applications instrumented by the agent to periodically use roughly 500 threads instead of around 50-60 threads.

Expected Behavior We expect thread usage to increase when the agent is running because of the native threads that are required for certain .NET profiling API calls, metric sampling (CPU, Memory, Garbage Collection, Thread Info), sending data to New Relic, and continuation based async timing, but it should not increase the thread count by a large amount.

Troubleshooting or NR Diag results See this internal link for some of the troubleshooting that has been done previously to try discover what's causing the spike in thread usage. The link details which agent features and instrumentation that were disabled, as well as the agent settings that were changed, to try to minimize the increase in thread usage.

Steps to Reproduce N/A

Your Environment N/A

Additional context I've seen this thread explosion from time to time with some test applications during performance testing. However, I do not yet understand what is exactly triggering this explosion. Is it just a combination of circumstances such as: 1. A certain amount of load within the application, 2. A certain amount of load on the system running the application, 3. Samplers, Harvests, transactions completing, and async continuations needing to execute around the same time, 4. Is this happening after a bunch of work was delayed due to a blocking GC?

Aug 20 '20 23:08 nrcventura

12/10/2021 - Let's do a sanity check first on this problem.

Dec 10 '21 18:12 angelatan2

Based on my testing during Firefly, I was not able to see an increase as described.

I have left a test environment running with the infrastructure agent as described in this ticket and have a meeting scheduled with @nrcventura to review the results tomorrow before closing.

Jan 06 '22 21:01 JcolemanNR

Here is the application under load: https://staging-one.newrelic.com/nr1-core/apm-nerdlets/overview/MjczMDcwfEFQTXxBUFBMSUNBVElPTnwxODk4NjI0Nw?account=273070&duration=10800000&filters=%28domain%20%3D%20%27APM%27%20AND%20type%20%3D%20%27APPLICATION%27%29&state=eea5a570-092c-9dfb-1d55-6a86f2c51aed

Here is the infrastructure query to show thread count for the process being instrumented/stressed: https://staging-one.newrelic.com/infra?account=273070&duration=10800000&state=33915c44-c65d-902c-69f3-19d44b8a6c4d

It is processing about 8k requests per minute, so we should see anything bad that the agent is doing

Jan 06 '22 22:01 JcolemanNR

Summary of Findings

My general observation is that the .NET runtime will create as many threads as the OS can spare in high throughput applications. This increase does occur more quickly with the .NET Agent attached to an application, but I did not observe radically different maximum thread counts with/without the agent attached over time.

After reviewing multiple performance counters related to overall application performance/CPU usage, my assertion is that the root cause of this issue relates to lock contention introduced by the agent. Without the agent attached, I experienced a maximum of <100 contested locks per second in a high throughput application. With the agent attached, I experienced lock contention peak >30,000 per second. This metric has been also flagged as out of spec for high performance applications by Microsoft when customers have engaged them to troubleshoot application performance.

The agent makes heavy use of interlocked counters, and explicit locks to guard collections. If the lock contention could be reduced, I believe there would be less performance overhead.

Feb 10 '22 00:02 JcolemanNR

This Issue has been marked stale after 90 days with no activity. It will be closed in 30 days if there is no activity.

Nov 29 '22 00:11 github-actions[bot]

https://issues.newrelic.com/browse/NEWRELIC-5587

Nov 29 '22 00:11 workato-integration[bot]

Jira CommentId: 118742 Commented by chynes:

We can use this as a spike for general thread contention/related performance issues

Dec 01 '22 20:12 workato-integration[bot]

Closed as not planned based on the findings from Josh on Feb 9.

Dec 28 '22 21:12 angelatan2

newrelic-dotnet-agent newrelic-dotnet-agent copied to clipboard

Unexpected thread usage increase

newrelic-dotnet-agent
newrelic-dotnet-agent copied to clipboard