akka.net
akka.net copied to clipboard
Improve CPU/Memory metrics collection at Akka.Cluster.Metrics
Introduction
Once #4126 will be merged, we will need to improve metrics loading, that is implemented in DefaultCollector.
The basic idea is to collect:
- Current CPU load on the node, in %
- Current Memory usage, in %
Using that information, AdaptiveLoadBalancingRoutingLogic will calculate availability of each node and will perform "smart" routing.
Ideally, we should collect more complete list:
- CPU usage by current process
- CPU total usage on the node machine (this will be used for routing)
- Memory usage by current process (in bytes)
- Memory total usage on the node machine
- Memory available on the node machine (this with previous one will give total utilization in %)
What we have right now
CPU
There is no API available in netstandard2.1 to collect CPU metrics out-of-the-box, like PerformanceCounters. So what we are doing now is using Process.TotalProcessorTime property, to get time, dedicated to current process. Having total time elapsed, we can give some estimation of CPU usage by current process.
But talking about CPU total usage, this approach would require to get all processes info with Process.GetProcesses() - which is very time consuming (especially when we have to deal with access violation exceptions here), when there are lots of processes.
So total CPU usage is just the same as current process CPU usage now. This is more or less fine for routing based on .NET process load, but not ideal if there are some other heavy processes running on machine.
Memory
Candidate list includes:
-
GC.GetTotalMemoryto get currently allocated managed memory size. There is alsoGC.GetGCMemoryInfo- that will provide struct withTotalAvailableMemoryBytesproperty, but this method is only available at.netstandard3.0, and we are targeting2.1 -
PerformanceCouters, which are working under Windows, and there is Mono implementation. There are some other Windows-only ways to get metrics.
-
Processclass, which provides multiple memory-related properties -
Using P/Invoke and working with native API
-
Getting some shell commands output, specific for OS
Currently, we are using the cross-platform sources available for netstandard2.1 - the Process class.
First issue
Same as for CPU: this is quite heavy to get all processes information. So current implementation treats MemoryUsage as current process usage, which is useful, but not ideal for nodes routing.
Second issue
Another issue is understanding the term of "used" memory, and getting "available" memory info.
To track unmanaged memory as well as managed, Process.PrivateMemorySize64 is used instead of GC.GetTotalMemory. It works well by itself. But it is hard to know the upper limit for this value, because it is not the allocated physical memory from RAM (see documentation).
Getting "available" memory is much more tricky, and I did not find anything available under .NET Core sdk to get this value. Ideally would be getting available size of installed physical memory (or available part of it in cloud environment). So far, the Process.VirtualMemorySize64 is used - but is is just a number of bytes in virtual address space, and does not correlate much with really available memory. But still it is one of the upper bounds for available memory, and can be used to get % of memory load (relative to other node).
In my understanding, ideally would be loading Available MBytes PerformanceCounter (but on all platforms) to get available memory, and get some way to load installed total available memory. This two would allow to get % Used Memory on the node, and perform routing. And provide all different Process properties in addition, like WorkingSet, PrivateMemorySize64, and others.
Maybe there is some other convenient approach. The main idea here is that while current used / available relation is Process.PrivateMemorySize64 / Process.VirtualMemorySize64 - it is always is range of [0, 1] and reflects the memory load. So we can compare nodes based on this. But value of 0.5 does not guarantee that there is available memory on the node at all, so need some more accurate values for node's memory capacity calculation.
Also, in scala they use Sigar library, which seem to have bindings for .NET. @Aaronontheweb Should we port this? It will require users to have binaries of this library for their OS, but may be working approach anyway.
Good to know @IgorFedchenko - I had no idea that they had .NET support.
Do you know if that library does anything that requires elevated permissions?
Do you know if that library does anything that requires elevated permissions?
Can not find any particular permission requirements in related articles, seems like the most quick way is to download binaries from here, and check by myself (here are some code samples I found).
So need to give it a try once we will work on this issue. Here is nice Wiki for the library.
@Aaronontheweb found some nice samples for how to more accurately measure CPU utilization:
https://github.com/dotnet/runtime/blob/4dc2ee1b5c0598ca02a69f63d03201129a3bf3f1/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.CpuUtilizationReader.Windows.cs
https://github.com/dotnet/runtime/blob/4dc2ee1b5c0598ca02a69f63d03201129a3bf3f1/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.CpuUtilizationReader.Unix.cs
Comparison between data collected using Akka.Cluster.Metrics and data collected using dotnet built-in perf counter that is available in .NET 6.0
Chart 1. Memory consumption, working set is not included because it makes the chart harder to read

Chart 2. CPU usage

Comparing CPU load measurement between Akka.Cluster.Metrics and Perf Counter on windows are quite accurate.
Memory comparison, however, are quite confusing. What do GC.TotalMemory() actually measure?
@Arkatufus https://stackoverflow.com/a/7455860/377476
Basically the GC.TotalMemory only measures memory allocated onto the heap via the garbage collector - any unmanaged memory, stack memory, and so on can't be measured there. Might be better to use one of the other measures suggested in the SO issue I linked (i.e. Process.TotalWorkingSet) - but those may have accuracy issues as well.
This is what the graph would look like if I include the total working set (reported by perf counter). The difference is quite big, its twice the size.

What about what I suggested above @Arkatufus ? Avoiding perf counters is a good idea given that they aren't x-plat - want some sort of abstraction that works on all supported runtimes.
What about what I suggested above @Arkatufus ? Avoiding perf counters is a good idea given that they aren't x-plat - want some sort of abstraction that works on all supported runtimes.
FWIW In the past I've found that querying Process and running math on times at a given sampling rate works well for CPU usages, as well as a combination of the memory queries given in the SO post alongside GC stats. (It's pretty useful in some cases to see both, has helped me debug at least one IIS-hosting issue.)
Actually, come and think of it, I frequently saw some of the WorkingSets to stay fairly constant depending on environment, so the GC numbers still had a lot of value in seeing spikes/etc.
https://github.com/akkadotnet/akka.net/blob/a69d7787a223c4f60ecbe6d7e46897fe6f42ff19/src/contrib/cluster/Akka.Cluster.Metrics/Collectors/DefaultCollector.cs#L62 - we should probably not be forcing full GC here each time we sample. No bueno. Just need to report on what current usage looks like, not have any side effects (plus this causes the current thread to block until GC is complete.)
https://github.com/akkadotnet/akka.net/blob/a69d7787a223c4f60ecbe6d7e46897fe6f42ff19/src/contrib/cluster/Akka.Cluster.Metrics/Collectors/DefaultCollector.cs#L64 - this value measures only virtual memory. Definitely not what users are interested in - we should have three separate memory counters:
- Process.WorkingSet64 - allows us to capture total allocated physical memory. Doesn't measure utilization exactly, but utilization for busy processes will be correlated to allocation.
- GC.GetTotalMemory - allows us to capture how much memory is being used currently by .NET managed objects. For the majority users, this is the most practical measure of end-to-end utilization.
- Process.VirtualMemory64 - if this number is going up, your performance might go down. Means that more and more working set memory is now being offloaded to disk - not necessarily a perf hit unless page faults also goes up.
IMHO, we should probably just track WorkingSet64 and GC.GetTotalMemory - for routing purposes that's probably accurate enough.
In .NET 6, as @Arkatufus pointed out in our call this morning, we can dual target and add support for the new x-plat runtime performance APIs: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/available-counters
What's the difference between Process.WorkingSet64 and Environment.WorkingSet?
https://learn.microsoft.com/en-us/dotnet/api/system.environment.workingset?view=netstandard-2.0#system-environment-workingset
Both APIs are available in .NET Standard 2.0.
Are we calling Process.Refresh() during sampling intervals?
https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process.refresh?view=netstandard-2.0#system-diagnostics-process-refresh
We can implement the latest Microsoft cross platform performance metrics EventCounters and retrieve the System.Runtime counters in v1.5. We can't backport it to v1.4 because its not available in .NET runtime 3.0 and below.
https://learn.microsoft.com/en-us/dotnet/core/diagnostics/event-counters
I've used it to collect the comparison data to create the graphs above, it's a lot easier to use since we only have a single source of truth to get all our numbers from.
These are the graph from WSL2, running with 4 virtual CPU and 6 GB RAM



CPU numbers look good x-plat on both of the tested platforms so far - and the memory tracking issues are consistently off on both platforms, which makes me think it's just a matter of calling Process.Refresh during sampling and using the right metrics values. This will be easier than we thought - the built-in metrics are actually pretty good for our purposes.
Here are the graph after the changes, some values makes sense, other just doesn't make any sense.
Graph 1, CPU usage, no code change:

Graph 2, Memory usage, data marked with (A) are from AkkaCluster..Metrics:

(A) used: changed to not force GC(A) available: changed fromProcess.VirtualMemorySize64toProcess.WorkingSet64- Add a new metric
StandardMetrics.MemoryVirtualand set it toProcess.VirtualMemorySize64, but it read as 2 terabytes so I'm not including it in the graph - Managed to record
StandardMetrics.MaxMemoryRecommendedbut it of no use,Process.MaxWorkingSetalways read as 1.34 megabytes
Graph 3. Adjusted virtual memory. If I removed the excess number from Process.VirtualMemorySize64, it actually have some usable value.

Forgot to mention that I induced artificial memory pressure on the test, that's why the memory chart looks different
I'm seeing weird behavior when I'm using Process.WorkingSet64 and GC.GetTotalMemory() inside MNTR.
Working set is supposed to be the total memory being allocated in physical memory and GC total is supposed to be the total memory allocated to the GC heap (gen-0 + gen-1 + gen-2 + LOH + POH).
This result is consistent over multiple MNTR run, the reported node 1 GC heap is always bigger than the working set, which isn't supposed to happen.
| Time |
Node 1 WorkingSet |
GC.Total |
Node 2 WorkingSet |
GC.Total |
Node 3 WorkingSet |
GC.Total |
|---|---|---|---|---|---|---|
| 0 | 70.65 | 86.21 | 69.19 | 36.14 | 68.96 | 36.13 |
| 0.478 | 71.65 | 87.60 | 69.99 | 37.70 | 69.52 | 37.64 |
| 1.487 | 71.91 | 87.92 | 70.31 | 38.01 | 69.95 | 38.07 |
| 2.496 | 72.14 | 88.32 | 70.58 | 38.42 | 70.06 | 38.34 |
| 3.508 | 72.21 | 88.63 | 70.70 | 38.70 | 70.22 | 38.78 |
| 4.515 | 72.33 | 88.84 | 70.75 | 39.11 | 70.30 | 39.06 |
| 5.528 | 72.43 | 89.11 | 70.80 | 39.47 | 70.39 | 39.36 |
| 6.54 | 72.62 | 89.46 | 70.85 | 39.74 | 70.41 | 39.56 |
| 7.547 | 72.96 | 89.76 | 70.90 | 40.03 | 70.44 | 39.84 |
| 8.558 | 73.34 | 89.98 | 71.04 | 40.32 | 70.55 | 40.11 |
| 9.568 | 76.50 | 91.24 | 72.68 | 41.12 | 72.19 | 41.01 |
| 10.584 | 78.09 | 87.48 | 72.94 | 41.83 | 72.43 | 41.79 |
So the only reason our MNTR specs are passing today:
https://github.com/akkadotnet/akka.net/blob/8bf4a613d1d9ba4c3424c2283369deb470d3af43/src/contrib/cluster/Akka.Cluster.Metrics/Collectors/DefaultCollector.cs#L61-L64
process.VirtualMemorySize64 will always return 2.1 TB of memory - which is vastly higher than what most Akka.NET applications will have access to. It's not a good metric in that it doesn't really approximate total physical memory usage on a system, but since that value is always going to be higher than what GC.GetTotalMemory reports the tests will pass.
closed via https://github.com/akkadotnet/akka.net/pull/6203