akka.net Improve CPU/Memory metrics collection at Akka.Cluster.Metrics

trafficstars

Introduction

Once #4126 will be merged, we will need to improve metrics loading, that is implemented in DefaultCollector.

The basic idea is to collect:

Current CPU load on the node, in %
Current Memory usage, in %

Using that information, AdaptiveLoadBalancingRoutingLogic will calculate availability of each node and will perform "smart" routing.

Ideally, we should collect more complete list:

CPU usage by current process
CPU total usage on the node machine (this will be used for routing)
Memory usage by current process (in bytes)
Memory total usage on the node machine
Memory available on the node machine (this with previous one will give total utilization in %)

What we have right now

CPU

There is no API available in netstandard2.1 to collect CPU metrics out-of-the-box, like PerformanceCounters. So what we are doing now is using Process.TotalProcessorTime property, to get time, dedicated to current process. Having total time elapsed, we can give some estimation of CPU usage by current process.

But talking about CPU total usage, this approach would require to get all processes info with Process.GetProcesses() - which is very time consuming (especially when we have to deal with access violation exceptions here), when there are lots of processes. So total CPU usage is just the same as current process CPU usage now. This is more or less fine for routing based on .NET process load, but not ideal if there are some other heavy processes running on machine.

Memory

Candidate list includes:

GC.GetTotalMemory to get currently allocated managed memory size. There is also GC.GetGCMemoryInfo - that will provide struct with TotalAvailableMemoryBytes property, but this method is only available at .netstandard3.0, and we are targeting 2.1
PerformanceCouters, which are working under Windows, and there is Mono implementation. There are some other Windows-only ways to get metrics.
Process class, which provides multiple memory-related properties
Using P/Invoke and working with native API
Getting some shell commands output, specific for OS

Currently, we are using the cross-platform sources available for netstandard2.1 - the Process class.

First issue

Same as for CPU: this is quite heavy to get all processes information. So current implementation treats MemoryUsage as current process usage, which is useful, but not ideal for nodes routing.

Second issue

Another issue is understanding the term of "used" memory, and getting "available" memory info.

To track unmanaged memory as well as managed, Process.PrivateMemorySize64 is used instead of GC.GetTotalMemory. It works well by itself. But it is hard to know the upper limit for this value, because it is not the allocated physical memory from RAM (see documentation). Getting "available" memory is much more tricky, and I did not find anything available under .NET Core sdk to get this value. Ideally would be getting available size of installed physical memory (or available part of it in cloud environment). So far, the Process.VirtualMemorySize64 is used - but is is just a number of bytes in virtual address space, and does not correlate much with really available memory. But still it is one of the upper bounds for available memory, and can be used to get % of memory load (relative to other node).

In my understanding, ideally would be loading Available MBytes PerformanceCounter (but on all platforms) to get available memory, and get some way to load installed total available memory. This two would allow to get % Used Memory on the node, and perform routing. And provide all different Process properties in addition, like WorkingSet, PrivateMemorySize64, and others.

Maybe there is some other convenient approach. The main idea here is that while current used / available relation is Process.PrivateMemorySize64 / Process.VirtualMemorySize64 - it is always is range of [0, 1] and reflects the memory load. So we can compare nodes based on this. But value of 0.5 does not guarantee that there is available memory on the node at all, so need some more accurate values for node's memory capacity calculation.

Jan 14 '20 18:01 IgorFedchenko

Also, in scala they use Sigar library, which seem to have bindings for .NET. @Aaronontheweb Should we port this? It will require users to have binaries of this library for their OS, but may be working approach anyway.

Jan 14 '20 18:01 IgorFedchenko

Good to know @IgorFedchenko - I had no idea that they had .NET support.

Do you know if that library does anything that requires elevated permissions?

Jan 14 '20 22:01 Aaronontheweb

Do you know if that library does anything that requires elevated permissions?

Can not find any particular permission requirements in related articles, seems like the most quick way is to download binaries from here, and check by myself (here are some code samples I found).

So need to give it a try once we will work on this issue. Here is nice Wiki for the library.

Jan 15 '20 16:01 IgorFedchenko

@Aaronontheweb found some nice samples for how to more accurately measure CPU utilization:

https://github.com/dotnet/runtime/blob/4dc2ee1b5c0598ca02a69f63d03201129a3bf3f1/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.CpuUtilizationReader.Windows.cs

https://github.com/dotnet/runtime/blob/4dc2ee1b5c0598ca02a69f63d03201129a3bf3f1/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.CpuUtilizationReader.Unix.cs

Dec 17 '20 19:12 IgorFedchenko

Comparison between data collected using Akka.Cluster.Metrics and data collected using dotnet built-in perf counter that is available in .NET 6.0

Chart 1. Memory consumption, working set is not included because it makes the chart harder to read

Chart 2. CPU usage

Oct 18 '22 19:10 Arkatufus

Comparing CPU load measurement between Akka.Cluster.Metrics and Perf Counter on windows are quite accurate. Memory comparison, however, are quite confusing. What do GC.TotalMemory() actually measure?

Oct 18 '22 19:10 Arkatufus

@Arkatufus https://stackoverflow.com/a/7455860/377476

Basically the GC.TotalMemory only measures memory allocated onto the heap via the garbage collector - any unmanaged memory, stack memory, and so on can't be measured there. Might be better to use one of the other measures suggested in the SO issue I linked (i.e. Process.TotalWorkingSet) - but those may have accuracy issues as well.

Oct 18 '22 19:10 Aaronontheweb

This is what the graph would look like if I include the total working set (reported by perf counter). The difference is quite big, its twice the size.

Oct 18 '22 20:10 Arkatufus

What about what I suggested above @Arkatufus ? Avoiding perf counters is a good idea given that they aren't x-plat - want some sort of abstraction that works on all supported runtimes.

Oct 18 '22 20:10 Aaronontheweb

What about what I suggested above @Arkatufus ? Avoiding perf counters is a good idea given that they aren't x-plat - want some sort of abstraction that works on all supported runtimes.

FWIW In the past I've found that querying Process and running math on times at a given sampling rate works well for CPU usages, as well as a combination of the memory queries given in the SO post alongside GC stats. (It's pretty useful in some cases to see both, has helped me debug at least one IIS-hosting issue.)

Actually, come and think of it, I frequently saw some of the WorkingSets to stay fairly constant depending on environment, so the GC numbers still had a lot of value in seeing spikes/etc.

Oct 18 '22 22:10 to11mtm

https://github.com/akkadotnet/akka.net/blob/a69d7787a223c4f60ecbe6d7e46897fe6f42ff19/src/contrib/cluster/Akka.Cluster.Metrics/Collectors/DefaultCollector.cs#L62 - we should probably not be forcing full GC here each time we sample. No bueno. Just need to report on what current usage looks like, not have any side effects (plus this causes the current thread to block until GC is complete.)

Oct 19 '22 14:10 Aaronontheweb

https://github.com/akkadotnet/akka.net/blob/a69d7787a223c4f60ecbe6d7e46897fe6f42ff19/src/contrib/cluster/Akka.Cluster.Metrics/Collectors/DefaultCollector.cs#L64 - this value measures only virtual memory. Definitely not what users are interested in - we should have three separate memory counters:

Process.WorkingSet64 - allows us to capture total allocated physical memory. Doesn't measure utilization exactly, but utilization for busy processes will be correlated to allocation.
GC.GetTotalMemory - allows us to capture how much memory is being used currently by .NET managed objects. For the majority users, this is the most practical measure of end-to-end utilization.
Process.VirtualMemory64 - if this number is going up, your performance might go down. Means that more and more working set memory is now being offloaded to disk - not necessarily a perf hit unless page faults also goes up.

IMHO, we should probably just track WorkingSet64 and GC.GetTotalMemory - for routing purposes that's probably accurate enough.

In .NET 6, as @Arkatufus pointed out in our call this morning, we can dual target and add support for the new x-plat runtime performance APIs: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/available-counters

Oct 19 '22 14:10 Aaronontheweb

What's the difference between Process.WorkingSet64 and Environment.WorkingSet?

https://learn.microsoft.com/en-us/dotnet/api/system.environment.workingset?view=netstandard-2.0#system-environment-workingset

Both APIs are available in .NET Standard 2.0.

Oct 19 '22 14:10 Aaronontheweb

Are we calling Process.Refresh() during sampling intervals?

https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process.refresh?view=netstandard-2.0#system-diagnostics-process-refresh

Oct 19 '22 14:10 Aaronontheweb

We can implement the latest Microsoft cross platform performance metrics EventCounters and retrieve the System.Runtime counters in v1.5. We can't backport it to v1.4 because its not available in .NET runtime 3.0 and below. https://learn.microsoft.com/en-us/dotnet/core/diagnostics/event-counters

Oct 19 '22 15:10 Arkatufus

I've used it to collect the comparison data to create the graphs above, it's a lot easier to use since we only have a single source of truth to get all our numbers from.

Oct 19 '22 15:10 Arkatufus

These are the graph from WSL2, running with 4 virtual CPU and 6 GB RAM

Oct 19 '22 16:10 Arkatufus

CPU numbers look good x-plat on both of the tested platforms so far - and the memory tracking issues are consistently off on both platforms, which makes me think it's just a matter of calling Process.Refresh during sampling and using the right metrics values. This will be easier than we thought - the built-in metrics are actually pretty good for our purposes.

Oct 19 '22 16:10 Aaronontheweb

Here are the graph after the changes, some values makes sense, other just doesn't make any sense.

Graph 1, CPU usage, no code change:

Graph 2, Memory usage, data marked with (A) are from AkkaCluster..Metrics:

(A) used: changed to not force GC
(A) available: changed from Process.VirtualMemorySize64 to Process.WorkingSet64
Add a new metric StandardMetrics.MemoryVirtual and set it to Process.VirtualMemorySize64, but it read as 2 terabytes so I'm not including it in the graph
Managed to record StandardMetrics.MaxMemoryRecommended but it of no use, Process.MaxWorkingSet always read as 1.34 megabytes

Graph 3. Adjusted virtual memory. If I removed the excess number from Process.VirtualMemorySize64, it actually have some usable value.

Oct 19 '22 20:10 Arkatufus

Forgot to mention that I induced artificial memory pressure on the test, that's why the memory chart looks different

Oct 19 '22 20:10 Arkatufus

I'm seeing weird behavior when I'm using Process.WorkingSet64 and GC.GetTotalMemory() inside MNTR. Working set is supposed to be the total memory being allocated in physical memory and GC total is supposed to be the total memory allocated to the GC heap (gen-0 + gen-1 + gen-2 + LOH + POH). This result is consistent over multiple MNTR run, the reported node 1 GC heap is always bigger than the working set, which isn't supposed to happen.

Time	Node 1 WorkingSet	GC.Total	Node 2 WorkingSet	GC.Total	Node 3 WorkingSet	GC.Total
0	70.65	86.21	69.19	36.14	68.96	36.13
0.478	71.65	87.60	69.99	37.70	69.52	37.64
1.487	71.91	87.92	70.31	38.01	69.95	38.07
2.496	72.14	88.32	70.58	38.42	70.06	38.34
3.508	72.21	88.63	70.70	38.70	70.22	38.78
4.515	72.33	88.84	70.75	39.11	70.30	39.06
5.528	72.43	89.11	70.80	39.47	70.39	39.36
6.54	72.62	89.46	70.85	39.74	70.41	39.56
7.547	72.96	89.76	70.90	40.03	70.44	39.84
8.558	73.34	89.98	71.04	40.32	70.55	40.11
9.568	76.50	91.24	72.68	41.12	72.19	41.01
10.584	78.09	87.48	72.94	41.83	72.43	41.79

Oct 27 '22 16:10 Arkatufus

So the only reason our MNTR specs are passing today:

https://github.com/akkadotnet/akka.net/blob/8bf4a613d1d9ba4c3424c2283369deb470d3af43/src/contrib/cluster/Akka.Cluster.Metrics/Collectors/DefaultCollector.cs#L61-L64

process.VirtualMemorySize64 will always return 2.1 TB of memory - which is vastly higher than what most Akka.NET applications will have access to. It's not a good metric in that it doesn't really approximate total physical memory usage on a system, but since that value is always going to be higher than what GC.GetTotalMemory reports the tests will pass.

Oct 28 '22 15:10 Aaronontheweb

closed via https://github.com/akkadotnet/akka.net/pull/6203

Oct 28 '22 18:10 Aaronontheweb

akka.net akka.net copied to clipboard

Improve CPU/Memory metrics collection at Akka.Cluster.Metrics

Introduction

What we have right now

CPU

Memory

First issue

Second issue

akka.net
akka.net copied to clipboard