akka.net icon indicating copy to clipboard operation
akka.net copied to clipboard

[PERF] Akka.Cluster Idle CPU on ARM

Open Aaronontheweb opened this issue 1 year ago • 12 comments
trafficstars

Version Information Version of Akka.NET? v1.5.21 Which Akka.NET Modules? Akka.Cluster, Akka.Remote, Akka

Describe the performance issue

From a user in our Discord - it looks like Akka.Cluster has significantly higher idle CPU on Apple Silicon ARM chips that it does on x64 chips.

Data and Specs

image

Expected behavior

Idle CPU should be less than 1% per process across all platforms.

Actual behavior

Idle CPU can be as high as 28% on ARM.

Additional context

This is mostly a .NET runtime issue, but we should keep an eye on in it in case there's something we're doing to exacerbate it or if there's something we can do to mitigate the issue.

Aaronontheweb avatar Jun 03 '24 17:06 Aaronontheweb

I posted view years some system API's to read the consume cycles from a process for windows/linux Maybe there is something newer/better out in the dotnet sdk tools.

Zetanova avatar Jun 04 '24 16:06 Zetanova

I posted view years some system API's to read the consume cycles from a process for windows/linux Maybe there is something newer/better out in the dotnet sdk tools.

I thought the biggest culprits for this would have been our DedicatedThreadPool, but these are numbers are with those disabled - this is all using the built-in .NET ThreadPool.

Aaronontheweb avatar Jun 04 '24 16:06 Aaronontheweb

https://github.com/akkadotnet/akka.net/issues/5400#issuecomment-1020871040 https://github.com/akkadotnet/akka.net/issues/5400#issuecomment-1021952961

maybe we can implement something like a StopWatch with it, but for CPU cycles. It would have an usage not only inside perf-tests but maybe also inside the ActorCell scheduler Algo

Zetanova avatar Jun 04 '24 16:06 Zetanova

I posted view years some system API's to read the consume cycles from a process for windows/linux Maybe there is something newer/better out in the dotnet sdk tools.

I thought the biggest culprits for this would have been our DedicatedThreadPool, but these are numbers are with those disabled - this is all using the built-in .NET ThreadPool.

Don't talk about the issue itself, but about your measurements. In k8s other "cloud" there are CPU units m https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

We don't need to use the same metrics, To read the use CPU cycles from the OS would be optimal for unit-tests and benchmarks Maybe its even possible to use them in runtime for workload measurement and scheduling and health-checks.

Zetanova avatar Jun 04 '24 16:06 Zetanova

Ah got it, you think this might just be an instrumentation issue then?

Aaronontheweb avatar Jun 04 '24 16:06 Aaronontheweb

Worth mentioning: I requisitioned all of the hardware for building a long-term Akka.NET observation lab yesterday https://x.com/Aaronontheweb/status/1797731816042049944

Going to have some experiments that are designed to run continuously for months in here, including idle CPU measurements. Bought a Raspberry Pi 5 for testing ARM support specifically.

Aaronontheweb avatar Jun 04 '24 16:06 Aaronontheweb

the used distro/kernel level can make a difference too.

Tip: and don't write the log/output to your SD card, it will trash the card very fast.

Zetanova avatar Jun 04 '24 17:06 Zetanova

I will make a demo project for the cycle measurement.

Zetanova avatar Jun 04 '24 17:06 Zetanova

the used distro/kernel level can make a difference too.

Tip: and don't write the log/output to your SD card, it will trash the card very fast.

Good idea - was planning on having a log-aggregator and OTEL running on a separate host (x64 instance)

Aaronontheweb avatar Jun 04 '24 17:06 Aaronontheweb

@Aaronontheweb here is the demo cycle watch https://github.com/Zetanova/CycleReader It is currenlty only for win, will make linux/OS in the next days

Zetanova avatar Jun 04 '24 18:06 Zetanova

@Aaronontheweb Its not possible to read some counter to get a "cpu-work done" value.

There are some registers in x64 and armv6+ to read cycles for the thread out, but it is very hard to read them over c# and they are not useful as they are, when the TaskPool is getting involved.

The best unit would be to measure "CPU units" like linux and clouds provider do. this is cpuUnits = processorTime / elapsedTime

Win and Linux provide counters for process and thread cpu time, but the System.Diagnostics.Processor class can be used for it.

It can be used for a simple integration tests to measure the idle cluster CPU or CPU utilization for a calibration workload to compare OS/Arch

Simplest form of a idle integration test

var p = Process.GetCurrentProcess();
var sw = new Stopwatch();

var processorTime0 = p.TotalProcessorTime;
sw.Start();

//do work or idle around
await Task.Delay(10_000);

var processorTime1 = p.TotalProcessorTime;
sw.Stop();

var processorTime = processorTime1 - processorTime0;
var cpuUnits = processorTime / sw.Elapsed;

//was idling?
Assert.True(cpuUnits < 0.01)

I put the tests in the above repo.

Zetanova avatar Jun 14 '24 06:06 Zetanova

Its not possible to read some counter to get a "cpu-work done" value.

We're just planning on sticking it in K8s with its own namespace and measuring mCPU used over time on a Grafana chart

Aaronontheweb avatar Jun 14 '24 14:06 Aaronontheweb