powerapi
powerapi copied to clipboard
Global slow down of powerapi with smartwatt formula
When running powerapi with smartwatt on a system with more than 30 containers, it runs smoothly for a few minutes and then slow down up to the point where it only processes one message every 20 seconds (or more !), even though the sensor produces many more more reports than that (I've tested the sensor alone with a simple socket server and I don't find any issue there).
I not sure of what is causing this behavior but I have found a few leads that might explain it :
- powerAPI is using the
multiprocQueueBaseimplementation of theActorSystem, which, according to the documentation has an "unexplained, core-level concern about dropped messages/deadlocks for the queue messaging in overload conditions." . - The
SockerDBused by the power_puller (I'm using the socket interface between the sensor and the formula) is starting the server using asyncio, which is discouraged by thespian author (see https://groups.google.com/g/thespianpy/c/8Y1Ggnd_8rM/m/oum7F5kzBwAJ)
Unfortunately I'm not sure the issue I encounter is actually related to these problems.
For full disclosure, I'm running into this with a custom modified powerAPI, although I doubt my modifications have anything to do with the slowdown : I simply add labels to reports (with a modifier actor) and use a custom prometheus exporter.
Hello,
Which version of smartwatts are you using ? With the last version you can change the used ActorSystem implementation (cf. https://github.com/powerapi-ng/powerapi/releases/tag/v1.1.0). Default now is simpleSystemBase
I'll try that, thanks !
BTW, do you habe any feedback on each ActorSystem implementation when running powerAPI (besides what's in thespian's documentation) : e.g. which is the fastest, the most reliable, etc. ?
I also think we have issue with logging when using multiprocQueueBase (simple python logging does not work well with multi-process and I've seen cases where log record were lost) and it will be worse with multiprocQueueBase with the current logging implementation.
In the base actor we say that we send messages to a logging actor (https://github.com/powerapi-ng/powerapi/blob/v1.1.0/powerapi/actor.py#L101) but actually we simply use the standard python logging framework.
I was using a version of powerapi just before this change !
I've switched to simpleSystemBase and it improves things a lot, the slowdown seems to be gone :)
thanks !
Great! Concerning your question related to different ActorSystem implementations, we tested them with 100.000 reports. Below you have a summary of results:
MultiprocQueueBase
- Lost messages: A sleep (0.25s) is required for avoiding it Same problem has been detected in the Inria Demo. Known Performance issue (cf. TheSpian Documentation)
- Execution time: unknown as messages are lost
simpleSystemBase
- Execution time: 90s.
- Mono process and synchronous
- No lost messages
multiprocTCPBase:
- Execution time: 50 minutes.
- No lost messages
As you stated, there is an issue with logging regarding MultiprocQueueBase implementation that is no present in simpleSystemBase. We observed that warnings were missing with the former but not with the latter. Therefore, we also need to improve the log management in PowerAPI when using multiprocessing.
Hello,
This problem has been fixed since version 2.0.0 of PowerAPI.
Please open a new issue if you encounter other performance problems.