powerapi Global slow down of powerapi with smartwatt formula

When running powerapi with smartwatt on a system with more than 30 containers, it runs smoothly for a few minutes and then slow down up to the point where it only processes one message every 20 seconds (or more !), even though the sensor produces many more more reports than that (I've tested the sensor alone with a simple socket server and I don't find any issue there).

I not sure of what is causing this behavior but I have found a few leads that might explain it :

powerAPI is using the multiprocQueueBase implementation of the ActorSystem, which, according to the documentation has an "unexplained, core-level concern about dropped messages/deadlocks for the queue messaging in overload conditions." .
The SockerDB used by the power_puller (I'm using the socket interface between the sensor and the formula) is starting the server using asyncio, which is discouraged by thespian author (see https://groups.google.com/g/thespianpy/c/8Y1Ggnd_8rM/m/oum7F5kzBwAJ)

Unfortunately I'm not sure the issue I encounter is actually related to these problems.

For full disclosure, I'm running into this with a custom modified powerAPI, although I doubt my modifications have anything to do with the slowdown : I simply add labels to reports (with a modifier actor) and use a custom prometheus exporter.

Aug 25 '22 10:08 PierreRustOrange

Hello, Which version of smartwatts are you using ? With the last version you can change the used ActorSystem implementation (cf. https://github.com/powerapi-ng/powerapi/releases/tag/v1.1.0). Default now is simpleSystemBase

Aug 25 '22 11:08 roda82

I'll try that, thanks ! BTW, do you habe any feedback on each ActorSystem implementation when running powerAPI (besides what's in thespian's documentation) : e.g. which is the fastest, the most reliable, etc. ?

I also think we have issue with logging when using multiprocQueueBase (simple python logging does not work well with multi-process and I've seen cases where log record were lost) and it will be worse with multiprocQueueBase with the current logging implementation. In the base actor we say that we send messages to a logging actor (https://github.com/powerapi-ng/powerapi/blob/v1.1.0/powerapi/actor.py#L101) but actually we simply use the standard python logging framework.

Aug 25 '22 12:08 PierreRustOrange

I was using a version of powerapi just before this change ! I've switched to simpleSystemBase and it improves things a lot, the slowdown seems to be gone :) thanks !

Aug 26 '22 07:08 PierreRustOrange

Great! Concerning your question related to different ActorSystem implementations, we tested them with 100.000 reports. Below you have a summary of results:

MultiprocQueueBase

Lost messages: A sleep (0.25s) is required for avoiding it Same problem has been detected in the Inria Demo. Known Performance issue (cf. TheSpian Documentation)
Execution time: unknown as messages are lost

simpleSystemBase

Execution time: 90s.
Mono process and synchronous
No lost messages

multiprocTCPBase:

Execution time: 50 minutes.
No lost messages

As you stated, there is an issue with logging regarding MultiprocQueueBase implementation that is no present in simpleSystemBase. We observed that warnings were missing with the former but not with the latter. Therefore, we also need to improve the log management in PowerAPI when using multiprocessing.

Aug 26 '22 09:08 roda82

Hello, This problem has been fixed since version 2.0.0 of PowerAPI. Please open a new issue if you encounter other performance problems.

May 05 '23 14:05 gfieni

powerapi powerapi copied to clipboard

Global slow down of powerapi with smartwatt formula

powerapi
powerapi copied to clipboard