azure-functions-host Performance: Optimize MetricsEventManager collection and rollup

Part of #7908 (and supersedes #7927 which had some history discussion as ref). This set of optimizations is paired because they're both metrics and the locks/contention are both in play from FunctionStarted and FunctionCompleted so really both need to be improved to make progress.

FunctionsActivityTracker

This is a non-trivial source of allocations under load. We're flushing the current _functionMetricsQueue collection every 5 seconds and during that time we build up event allocations (of FunctionMetrics) to later summarize and dispose/GC. Instead of doing this, we can keep track of the metrics much cheaper as we go. That's what this PR is doing by using a mutable struct (don't tell my mom) and a by ref increment on it for each metric call as we go. When it's time to record, we swap out the dictionary for another one and iterate/record that snapshot.

MetricsEventManager

In here QueuedEvents is another allocation and lock source via captures into the dictionary AddOrUpdate args and the locking contention within (especially around update in a load test).

Note that EventKey is a struct with more code when a readonly record struct would be much cleaner...this is because StyleCop has an intermittent compile bug with records that is fixed but not released. I added a comment around this - we can go back and swap it fast later. This swap is predicated on case-sensitive being okay. I regressed performance a bit in this PR to maintain compatibility but I think we'd be okay removing case sensitivity - if so, we can do this and increase perf with minimal changes today (see a9e043753250ec30dc20ae1b1583aeaa9b67883a for the diff).

Overall, this PR the following benefits:

Far fewer allocations per metric call
Many -> no allocations to aggregate (see benchmarks in outdated comments below for LINQ overhead)
Less LOH (large object heap) interaction as no collection gets very large (total ceiling cardinality is the distinct function count)
Recording time is now O(1) rather than O(n^2)
No locking around QueuedEvents.AddOrUpdate
No capture allocations for QueuedEvents

It's meant to be a cheaper/faster way to get the same metrics. This will help always, but more so under load tests where we aggregate more events per 5 seconds.

Issue describing the changes in this PR

Resolves a component of #7908

Pull request checklist

[x] My changes do not require documentation changes
- [ ] Otherwise: Documentation issue linked to PR
[x] My changes should not be added to the release notes for the next release
- [ ] Otherwise: I've added my notes to release_notes.md
[x] My changes do not need to be backported to a previous version
- [ ] Otherwise: Backport tracked by issue/PR #issue_or_pr
[x] I have added all required tests (Unit tests, E2E tests)

Crank benchmarks

application	dev2-1	eventsstruct2-7
CPU Usage (%)	7	7	0.00%
Cores usage (%)	202	200	-0.99%
Working Set (MB)	308	359	+16.56%
Private Memory (MB)	1,133	1,157	+2.12%
Build Time (ms)	17,993	19,415	+7.90%
Start Time (ms)	4,778	4,814	+0.75%
Published Size (KB)	674,687	674,685	-0.00%
.NET Core SDK Version	6.0.101	6.0.101
Max CPU Usage (%)	99	99	0.00%
Max Working Set (MB)	322	365	+13.35%
Max GC Heap Size (MB)	128	157	+22.66%
Size of committed memory by the GC (MB)	134	186	+38.81%
Max Number of Gen 0 GCs / sec	13.00	13.00	0.00%
Max Number of Gen 1 GCs / sec	6.00	6.00	0.00%
Max Number of Gen 2 GCs / sec	1.00	1.00	0.00%
Max Time in GC (%)	6.00	7.00	+16.67%
Max Gen 0 Size (B)	55,366,672	20,417,288	-63.12%
Max Gen 1 Size (B)	9,674,736	10,629,888	+9.87%
Max Gen 2 Size (B)	26,395,496	17,749,984	-32.75%
Max LOH Size (B)	9,321,280	4,601,568	-50.63%
Max Allocation Rate (B/sec)	333,640,760	326,232,888	-2.22%
Max GC Heap Fragmentation	77	49	-35.61%
# of Assemblies Loaded	201	201	0.00%
Max Exceptions (#/s)	279	261	-6.45%
Max Lock Contention (#/s)	114	80	-29.82%
Max ThreadPool Threads Count	32	30	-6.25%
Max ThreadPool Queue Length	95	92	-3.16%
Max ThreadPool Items (#/s)	36,150	36,719	+1.57%
Max Active Timers	98	93	-5.10%
IL Jitted (B)	1,054,878	1,056,139	+0.12%
Methods Jitted	10,887	10,899	+0.11%
Avg Allocation Rate (B/sec)	302,116,391	295,875,083	-2.07%
90th Allocation Rate (B/sec)	331,095,488	323,239,744	-2.37%

load	dev2-1	eventsstruct2-7
CPU Usage (%)	1	1	0.00%
Cores usage (%)	36	38	+5.56%
Working Set (MB)	37	37	0.00%
Private Memory (MB)	357	357	0.00%
Start Time (ms)	0	0
First Request (ms)	350	342	-2.29%
Requests/sec	8,072	8,183	+1.38%
Requests	485,103	491,761	+1.37%
Mean latency (ms)	13.95	13.49	-3.30%
Max latency (ms)	179.16	163.82	-8.56%
Bad responses	0	0
Socket errors	0	0
Read throughput (MB/s)	1.24	1.26	+1.61%
Latency 50th (ms)	10.02	9.93	-0.90%
Latency 75th (ms)	15.93	15.25	-4.27%
Latency 90th (ms)	28.37	26.86	-5.32%
Latency 99th (ms)	65.38	60.27	-7.82%

Dec 02 '21 14:12 NickCraver

Have to break for boosters here but I'll move this to draft and go figure out why build is angry.

Dec 02 '21 14:12 NickCraver

/azp run

Dec 02 '21 16:12 NickCraver

Azure Pipelines successfully started running 1 pipeline(s).

Dec 02 '21 16:12 azure-pipelines[bot]

While this is a ~1% gain, I'm not super happy with the structure here - we're still allocating in a slab mechanism to the LOH which feels like we can do better. Going to poke this a bit more.

Dec 14 '21 22:12 NickCraver

Notes for later: we can replace the LINQ here with a dictionary build and do an atomic swap of the queue to eliminate the copy in a snapshot. Example benchmark:

#LINQPad optimize+     // Enable compiler optimizations

void Main()
{
	Util.AutoScrollResults = true;
	BenchmarkRunner.Run<LinqBenchmarks>();
}

[MemoryDiagnoser]
public class LinqBenchmarks
{
	private readonly ConcurrentQueue<FunctionMetrics> _functionMetricsQueue = new ConcurrentQueue<FunctionMetrics>();
	
	[GlobalSetup]
	public void Setup()
	{
		// Generate enough to make it non-trivial
		for (var i = 0; i < 100; i++)
		{
			for (var j = 0; j < 50; j++)
			{
				_functionMetricsQueue.Enqueue(new FunctionMetrics("Test Name" + j.ToString(), ExecutionStage.Started, 0));
				_functionMetricsQueue.Enqueue(new FunctionMetrics("Test Name" + j.ToString(), ExecutionStage.InProgress, 0));
				_functionMetricsQueue.Enqueue(new FunctionMetrics("Test Name" + j.ToString(), ExecutionStage.Succeeded, 4));
				_functionMetricsQueue.Enqueue(new FunctionMetrics("Test Name" + j.ToString(), ExecutionStage.Finished, 4));
			}
		}
	}
	
	[Benchmark]
	public void LinqMethod()
	{
		var aggregatedEventsPerFunction = from item in _functionMetricsQueue
										  group item by item.FunctionName into functionGroups
										  select new
										  {
											  FunctionName = functionGroups.Key,
											  StartedCount = Convert.ToUInt64(functionGroups.Count(x => x.ExecutionStage == ExecutionStage.Started)),
											  FailedCount = Convert.ToUInt64(functionGroups.Count(x => x.ExecutionStage == ExecutionStage.Failed)),
											  SucceededCount = Convert.ToUInt64(functionGroups.Count(x => x.ExecutionStage == ExecutionStage.Succeeded)),
											  TotalExectionTimeInMs = Convert.ToUInt64(functionGroups.Sum(x => Convert.ToDecimal(x.ExecutionTimeInMS)))
										  };

		foreach (var functionEvent in aggregatedEventsPerFunction)
		{
			GC.KeepAlive(functionEvent);
		}
	}

	[Benchmark]
	public void DictionaryMethod()
	{
		var aggregatedEventsPerFunction = new Dictionary<string, MetricSummary>();

		foreach (var metric in _functionMetricsQueue)
		{
			if (!aggregatedEventsPerFunction.TryGetValue(metric.FunctionName, out var summary))
			{
				summary = default;
			}
			switch (metric.ExecutionStage)
			{
				case ExecutionStage.Started:
					summary.StartedCount++;
					break;
				case ExecutionStage.Failed:
					summary.FailedCount++;
					break;
				case ExecutionStage.Succeeded:
					summary.SucceededCount++;
					break;
			}
			summary.TotalExectionTimeInMs += metric.ExecutionTimeInMS;			
			aggregatedEventsPerFunction[metric.FunctionName] = summary;
		}

		foreach (var functionEvent in aggregatedEventsPerFunction)
		{
			GC.KeepAlive(functionEvent);
		}
	}

	[Benchmark]
	public void DictionaryMarshallMethod()
	{
		var aggregatedEventsPerFunction = new Dictionary<string, MetricSummary>();

		foreach (var metric in _functionMetricsQueue)
		//while (_functionMetricsQueue.TryDequeue(out var metric)) // don't purge benchmark data...
		{
			ref var summary = ref CollectionsMarshal.GetValueRefOrAddDefault(aggregatedEventsPerFunction, metric.FunctionName, out _);
			switch (metric.ExecutionStage)
			{
				case ExecutionStage.Started:
					summary.StartedCount++;
					break;
				case ExecutionStage.Failed:
					summary.FailedCount++;
					break;
				case ExecutionStage.Succeeded:
					summary.SucceededCount++;
					break;
			}
			summary.TotalExectionTimeInMs += metric.ExecutionTimeInMS;
		}

		foreach (var functionEvent in aggregatedEventsPerFunction)
		{
			GC.KeepAlive(functionEvent);
		}
	}

	private struct MetricSummary
	{
		public long StartedCount;
		public long FailedCount;
		public long SucceededCount;
		public long TotalExectionTimeInMs;
	}
}

public readonly struct FunctionMetrics
{
	public readonly long ExecutionTimeInMS;
	public readonly ExecutionStage ExecutionStage;
	public readonly string FunctionName;

	public FunctionMetrics(string functionName, ExecutionStage executionStage, long executionTimeInMS)
	{
		FunctionName = functionName;
		ExecutionStage = executionStage;
		ExecutionTimeInMS = executionTimeInMS;
	}
}

public enum ExecutionStage : byte
{
	Started,
	InProgress,
	Finished,
	Failed,
	Succeeded
}

Results:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Allocated
LinqMethod	1,585.7 μs	30.33 μs	36.10 μs	74.2188	27.3438	1,228 KB
DictionaryMethod	829.9 μs	13.61 μs	12.06 μs	-	-	11 KB
DictionaryMarshallMethod	585.1 μs	11.32 μs	13.47 μs	-	-	11 KB

Dec 14 '21 23:12 NickCraver

/azp run

Dec 22 '21 00:12 NickCraver

Azure Pipelines successfully started running 1 pipeline(s).

Dec 22 '21 00:12 azure-pipelines[bot]

/azp run

Jan 11 '22 21:01 NickCraver

Azure Pipelines successfully started running 1 pipeline(s).

Jan 11 '22 21:01 azure-pipelines[bot]

pinging @brettsam & @mathewc for a second pass!

Mar 25 '22 18:03 liliankasem

pinging @brettsam & @mathewc for a second pass!

bumping again :)

Oct 20 '22 23:10 liliankasem

@fabiocav @mathewc Dean and I see 2 places to change to interlock (107/132) to get more accuracy than the existing (pre-PR) code which can drop some events, but I don't want to set everything up again if we're not going to do anything with it anyway. Should I just close this PR or try to update it later? There's a high time cost to re-profiling, so asking before chasing anything here.

Dec 15 '22 13:12 NickCraver

@NickCraver please go ahead with the changes. I've discussed this with the team and assigned some people to help drive the review (also including @mathewc).

Jan 13 '23 17:01 fabiocav

@NickCraver please go ahead with the changes. I've discussed this with the team and assigned some people to help drive the review (also including @mathewc).

@NickCraver any updates on this?

Feb 16 '23 22:02 liliankasem

@liliankasem @fabiocav I'm swamped in Antares for the foreseeable future and don't have this setup after a machine pave. The totality of the changes I'm talking about are changing queuedEvent.Count++; and queuedEvent.Value.Count++; to Interlocked.Increment for total accuracy, is that something the team can take as part of this? (should have push permissions). It'll be a while before I get back to Functions in general with the current queue - apologies!

Feb 16 '23 22:02 NickCraver

@liliankasem @fabiocav I'm swamped in Antares for the foreseeable future and don't have this setup after a machine pave. The totality of the changes I'm talking about are changing queuedEvent.Count++; and queuedEvent.Value.Count++; to Interlocked.Increment for total accuracy, is that something the team can take as part of this? (should have push permissions). It'll be a while before I get back to Functions in general with the current queue - apologies!

No worries at all, thank you for updating us! I'll sync with @fabiocav about what we can do to get this merged from our side

Feb 16 '23 23:02 liliankasem

Opened issue to address this in v5: https://github.com/Azure/azure-functions-host/issues/9173

Mar 22 '23 20:03 liliankasem

azure-functions-host azure-functions-host copied to clipboard

Performance: Optimize MetricsEventManager collection and rollup

FunctionsActivityTracker

MetricsEventManager

Issue describing the changes in this PR

Pull request checklist

Crank benchmarks

azure-functions-host
azure-functions-host copied to clipboard