azure-functions-host icon indicating copy to clipboard operation
azure-functions-host copied to clipboard

Performance: Optimize MetricsEventManager collection and rollup

Open NickCraver opened this issue 3 years ago • 10 comments

Part of #7908 (and supersedes #7927 which had some history discussion as ref). This set of optimizations is paired because they're both metrics and the locks/contention are both in play from FunctionStarted and FunctionCompleted so really both need to be improved to make progress.

FunctionsActivityTracker

This is a non-trivial source of allocations under load. We're flushing the current _functionMetricsQueue collection every 5 seconds and during that time we build up event allocations (of FunctionMetrics) to later summarize and dispose/GC. Instead of doing this, we can keep track of the metrics much cheaper as we go. That's what this PR is doing by using a mutable struct (don't tell my mom) and a by ref increment on it for each metric call as we go. When it's time to record, we swap out the dictionary for another one and iterate/record that snapshot.

MetricsEventManager

In here QueuedEvents is another allocation and lock source via captures into the dictionary AddOrUpdate args and the locking contention within (especially around update in a load test).

Note that EventKey is a struct with more code when a readonly record struct would be much cleaner...this is because StyleCop has an intermittent compile bug with records that is fixed but not released. I added a comment around this - we can go back and swap it fast later. This swap is predicated on case-sensitive being okay. I regressed performance a bit in this PR to maintain compatibility but I think we'd be okay removing case sensitivity - if so, we can do this and increase perf with minimal changes today (see a9e043753250ec30dc20ae1b1583aeaa9b67883a for the diff).

Overall, this PR the following benefits:

  • Far fewer allocations per metric call
  • Many -> no allocations to aggregate (see benchmarks in outdated comments below for LINQ overhead)
  • Less LOH (large object heap) interaction as no collection gets very large (total ceiling cardinality is the distinct function count)
  • Recording time is now O(1) rather than O(n^2)
  • No locking around QueuedEvents.AddOrUpdate
  • No capture allocations for QueuedEvents

It's meant to be a cheaper/faster way to get the same metrics. This will help always, but more so under load tests where we aggregate more events per 5 seconds.

Issue describing the changes in this PR

Resolves a component of #7908

Pull request checklist

  • [x] My changes do not require documentation changes
    • [ ] Otherwise: Documentation issue linked to PR
  • [x] My changes should not be added to the release notes for the next release
    • [ ] Otherwise: I've added my notes to release_notes.md
  • [x] My changes do not need to be backported to a previous version
    • [ ] Otherwise: Backport tracked by issue/PR #issue_or_pr
  • [x] I have added all required tests (Unit tests, E2E tests)

Crank benchmarks

application dev2-1 eventsstruct2-7
CPU Usage (%) 7 7 0.00%
Cores usage (%) 202 200 -0.99%
Working Set (MB) 308 359 +16.56%
Private Memory (MB) 1,133 1,157 +2.12%
Build Time (ms) 17,993 19,415 +7.90%
Start Time (ms) 4,778 4,814 +0.75%
Published Size (KB) 674,687 674,685 -0.00%
.NET Core SDK Version 6.0.101 6.0.101
Max CPU Usage (%) 99 99 0.00%
Max Working Set (MB) 322 365 +13.35%
Max GC Heap Size (MB) 128 157 +22.66%
Size of committed memory by the GC (MB) 134 186 +38.81%
Max Number of Gen 0 GCs / sec 13.00 13.00 0.00%
Max Number of Gen 1 GCs / sec 6.00 6.00 0.00%
Max Number of Gen 2 GCs / sec 1.00 1.00 0.00%
Max Time in GC (%) 6.00 7.00 +16.67%
Max Gen 0 Size (B) 55,366,672 20,417,288 -63.12%
Max Gen 1 Size (B) 9,674,736 10,629,888 +9.87%
Max Gen 2 Size (B) 26,395,496 17,749,984 -32.75%
Max LOH Size (B) 9,321,280 4,601,568 -50.63%
Max Allocation Rate (B/sec) 333,640,760 326,232,888 -2.22%
Max GC Heap Fragmentation 77 49 -35.61%
# of Assemblies Loaded 201 201 0.00%
Max Exceptions (#/s) 279 261 -6.45%
Max Lock Contention (#/s) 114 80 -29.82%
Max ThreadPool Threads Count 32 30 -6.25%
Max ThreadPool Queue Length 95 92 -3.16%
Max ThreadPool Items (#/s) 36,150 36,719 +1.57%
Max Active Timers 98 93 -5.10%
IL Jitted (B) 1,054,878 1,056,139 +0.12%
Methods Jitted 10,887 10,899 +0.11%
Avg Allocation Rate (B/sec) 302,116,391 295,875,083 -2.07%
90th Allocation Rate (B/sec) 331,095,488 323,239,744 -2.37%
load dev2-1 eventsstruct2-7
CPU Usage (%) 1 1 0.00%
Cores usage (%) 36 38 +5.56%
Working Set (MB) 37 37 0.00%
Private Memory (MB) 357 357 0.00%
Start Time (ms) 0 0
First Request (ms) 350 342 -2.29%
Requests/sec 8,072 8,183 +1.38%
Requests 485,103 491,761 +1.37%
Mean latency (ms) 13.95 13.49 -3.30%
Max latency (ms) 179.16 163.82 -8.56%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1.24 1.26 +1.61%
Latency 50th (ms) 10.02 9.93 -0.90%
Latency 75th (ms) 15.93 15.25 -4.27%
Latency 90th (ms) 28.37 26.86 -5.32%
Latency 99th (ms) 65.38 60.27 -7.82%

NickCraver avatar Dec 02 '21 14:12 NickCraver

Have to break for boosters here but I'll move this to draft and go figure out why build is angry.

NickCraver avatar Dec 02 '21 14:12 NickCraver

/azp run

NickCraver avatar Dec 02 '21 16:12 NickCraver

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Dec 02 '21 16:12 azure-pipelines[bot]

While this is a ~1% gain, I'm not super happy with the structure here - we're still allocating in a slab mechanism to the LOH which feels like we can do better. Going to poke this a bit more.

NickCraver avatar Dec 14 '21 22:12 NickCraver

Notes for later: we can replace the LINQ here with a dictionary build and do an atomic swap of the queue to eliminate the copy in a snapshot. Example benchmark:

#LINQPad optimize+     // Enable compiler optimizations

void Main()
{
	Util.AutoScrollResults = true;
	BenchmarkRunner.Run<LinqBenchmarks>();
}

[MemoryDiagnoser]
public class LinqBenchmarks
{
	private readonly ConcurrentQueue<FunctionMetrics> _functionMetricsQueue = new ConcurrentQueue<FunctionMetrics>();
	
	[GlobalSetup]
	public void Setup()
	{
		// Generate enough to make it non-trivial
		for (var i = 0; i < 100; i++)
		{
			for (var j = 0; j < 50; j++)
			{
				_functionMetricsQueue.Enqueue(new FunctionMetrics("Test Name" + j.ToString(), ExecutionStage.Started, 0));
				_functionMetricsQueue.Enqueue(new FunctionMetrics("Test Name" + j.ToString(), ExecutionStage.InProgress, 0));
				_functionMetricsQueue.Enqueue(new FunctionMetrics("Test Name" + j.ToString(), ExecutionStage.Succeeded, 4));
				_functionMetricsQueue.Enqueue(new FunctionMetrics("Test Name" + j.ToString(), ExecutionStage.Finished, 4));
			}
		}
	}
	
	[Benchmark]
	public void LinqMethod()
	{
		var aggregatedEventsPerFunction = from item in _functionMetricsQueue
										  group item by item.FunctionName into functionGroups
										  select new
										  {
											  FunctionName = functionGroups.Key,
											  StartedCount = Convert.ToUInt64(functionGroups.Count(x => x.ExecutionStage == ExecutionStage.Started)),
											  FailedCount = Convert.ToUInt64(functionGroups.Count(x => x.ExecutionStage == ExecutionStage.Failed)),
											  SucceededCount = Convert.ToUInt64(functionGroups.Count(x => x.ExecutionStage == ExecutionStage.Succeeded)),
											  TotalExectionTimeInMs = Convert.ToUInt64(functionGroups.Sum(x => Convert.ToDecimal(x.ExecutionTimeInMS)))
										  };

		foreach (var functionEvent in aggregatedEventsPerFunction)
		{
			GC.KeepAlive(functionEvent);
		}
	}

	[Benchmark]
	public void DictionaryMethod()
	{
		var aggregatedEventsPerFunction = new Dictionary<string, MetricSummary>();

		foreach (var metric in _functionMetricsQueue)
		{
			if (!aggregatedEventsPerFunction.TryGetValue(metric.FunctionName, out var summary))
			{
				summary = default;
			}
			switch (metric.ExecutionStage)
			{
				case ExecutionStage.Started:
					summary.StartedCount++;
					break;
				case ExecutionStage.Failed:
					summary.FailedCount++;
					break;
				case ExecutionStage.Succeeded:
					summary.SucceededCount++;
					break;
			}
			summary.TotalExectionTimeInMs += metric.ExecutionTimeInMS;			
			aggregatedEventsPerFunction[metric.FunctionName] = summary;
		}

		foreach (var functionEvent in aggregatedEventsPerFunction)
		{
			GC.KeepAlive(functionEvent);
		}
	}

	[Benchmark]
	public void DictionaryMarshallMethod()
	{
		var aggregatedEventsPerFunction = new Dictionary<string, MetricSummary>();

		foreach (var metric in _functionMetricsQueue)
		//while (_functionMetricsQueue.TryDequeue(out var metric)) // don't purge benchmark data...
		{
			ref var summary = ref CollectionsMarshal.GetValueRefOrAddDefault(aggregatedEventsPerFunction, metric.FunctionName, out _);
			switch (metric.ExecutionStage)
			{
				case ExecutionStage.Started:
					summary.StartedCount++;
					break;
				case ExecutionStage.Failed:
					summary.FailedCount++;
					break;
				case ExecutionStage.Succeeded:
					summary.SucceededCount++;
					break;
			}
			summary.TotalExectionTimeInMs += metric.ExecutionTimeInMS;
		}

		foreach (var functionEvent in aggregatedEventsPerFunction)
		{
			GC.KeepAlive(functionEvent);
		}
	}

	private struct MetricSummary
	{
		public long StartedCount;
		public long FailedCount;
		public long SucceededCount;
		public long TotalExectionTimeInMs;
	}
}

public readonly struct FunctionMetrics
{
	public readonly long ExecutionTimeInMS;
	public readonly ExecutionStage ExecutionStage;
	public readonly string FunctionName;

	public FunctionMetrics(string functionName, ExecutionStage executionStage, long executionTimeInMS)
	{
		FunctionName = functionName;
		ExecutionStage = executionStage;
		ExecutionTimeInMS = executionTimeInMS;
	}
}

public enum ExecutionStage : byte
{
	Started,
	InProgress,
	Finished,
	Failed,
	Succeeded
}

Results:

Method Mean Error StdDev Gen 0 Gen 1 Allocated
LinqMethod 1,585.7 μs 30.33 μs 36.10 μs 74.2188 27.3438 1,228 KB
DictionaryMethod 829.9 μs 13.61 μs 12.06 μs - - 11 KB
DictionaryMarshallMethod 585.1 μs 11.32 μs 13.47 μs - - 11 KB

NickCraver avatar Dec 14 '21 23:12 NickCraver

/azp run

NickCraver avatar Dec 22 '21 00:12 NickCraver

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Dec 22 '21 00:12 azure-pipelines[bot]

/azp run

NickCraver avatar Jan 11 '22 21:01 NickCraver

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jan 11 '22 21:01 azure-pipelines[bot]

pinging @brettsam & @mathewc for a second pass!

liliankasem avatar Mar 25 '22 18:03 liliankasem

pinging @brettsam & @mathewc for a second pass!

bumping again :)

liliankasem avatar Oct 20 '22 23:10 liliankasem

@fabiocav @mathewc Dean and I see 2 places to change to interlock (107/132) to get more accuracy than the existing (pre-PR) code which can drop some events, but I don't want to set everything up again if we're not going to do anything with it anyway. Should I just close this PR or try to update it later? There's a high time cost to re-profiling, so asking before chasing anything here.

NickCraver avatar Dec 15 '22 13:12 NickCraver

@NickCraver please go ahead with the changes. I've discussed this with the team and assigned some people to help drive the review (also including @mathewc).

fabiocav avatar Jan 13 '23 17:01 fabiocav

@NickCraver please go ahead with the changes. I've discussed this with the team and assigned some people to help drive the review (also including @mathewc).

@NickCraver any updates on this?

liliankasem avatar Feb 16 '23 22:02 liliankasem

@liliankasem @fabiocav I'm swamped in Antares for the foreseeable future and don't have this setup after a machine pave. The totality of the changes I'm talking about are changing queuedEvent.Count++; and queuedEvent.Value.Count++; to Interlocked.Increment for total accuracy, is that something the team can take as part of this? (should have push permissions). It'll be a while before I get back to Functions in general with the current queue - apologies!

NickCraver avatar Feb 16 '23 22:02 NickCraver

@liliankasem @fabiocav I'm swamped in Antares for the foreseeable future and don't have this setup after a machine pave. The totality of the changes I'm talking about are changing queuedEvent.Count++; and queuedEvent.Value.Count++; to Interlocked.Increment for total accuracy, is that something the team can take as part of this? (should have push permissions). It'll be a while before I get back to Functions in general with the current queue - apologies!

No worries at all, thank you for updating us! I'll sync with @fabiocav about what we can do to get this merged from our side

liliankasem avatar Feb 16 '23 23:02 liliankasem

Opened issue to address this in v5: https://github.com/Azure/azure-functions-host/issues/9173

liliankasem avatar Mar 22 '23 20:03 liliankasem