PADD icon indicating copy to clipboard operation
PADD copied to clipboard

High (steadily increasing) CPU-Load

Open rezzorix opened this issue 4 years ago • 16 comments

Hi, I am running a Pi-hole on the following system:

raspberry pi zero w Linux rpi0w 5.4.72+ #1356

Pi-hole v5.2.2 Web Interface v5.2.2 FTL v5.3.4

Just having the pi-hole running regularly, the CPU load is less than 20%.

Once starting and running padd.sh the CPU load goes steadily 30%...60%...80%. This happens over the course of 20..30 seconds and stays at high load.

I cannot imagine that padd.sh takes that much of resources... anyone else runs into the same issues? Any known solution?

rezzorix avatar Jan 06 '21 11:01 rezzorix

I am noticing very high CPU usage with Padd runing too, 80% average. Killing padd returns CPU load to 20% average.

Raspbery Pi 1 Model B+

BangDroid avatar Jan 20 '21 14:01 BangDroid

On my Raspberry Pi 1B, I reduce the refresh frequency from 5 seconds to 15 seconds. This greatly reduces the load averages.

https://github.com/pi-hole/PADD/blob/ef5acdc7e09389ce4212726d43f084c25a82b53a/padd.sh#L1175-L1176

fullburnen avatar Feb 02 '21 05:02 fullburnen

On my Raspberry Pi 1B, I reduce the refresh frequency from 5 seconds to 15 seconds. This greatly reduces the load averages.

https://github.com/pi-hole/PADD/blob/ef5acdc7e09389ce4212726d43f084c25a82b53a/padd.sh#L1175-L1176

Hi, ok... thanks I tried this now over some time - the CPU load still increases over time to max it just takes longer than with "sleep at 5 seconds"...

I am testing on an RPi 3B+ and a RPi Zero W... both with similar issues. Also, because of CPU load increase the temperature of the RPis is rising.

So I really suspect that somewhere in the padd.sh code something causes the CPU to overload over time.

BTW... with pihole -c this CPU issues do not appear at all.

rezzorix avatar Feb 02 '21 07:02 rezzorix

Same thing on my ZeroW with Pi-hole v5.2.4 Web Interface v5.3.2 FTL v5.6

Another strange result: on first launch, PADD reports v3.5.1, but after the first 5 second sleep/refresh, the displayed PADD version changes to v3.4 (and turns red). (Perhaps a separate bug.)

HansTallis avatar Feb 08 '21 15:02 HansTallis

The version is hardcoded in to the script at the top. There's no real way for it to change to a different version unless there's actually a different version in the same directory and you're calling it.

Edit: https://github.com/pi-hole/PADD/blob/master/padd.sh#L20

dschaper avatar Feb 08 '21 15:02 dschaper

Thank you dschaper for the quick reply. It seems I had a stale piHoleVersion file (read by GetVersionInformation()) .. and which wasn't getting updated, perhaps because not writable or in the wrong directory. I can't now recreate the problem, but it seems fixed in any event. (Possibly interaction between "if [ -e ... ]" and the "source" command?)

HansTallis avatar Feb 08 '21 15:02 HansTallis

Thanks for the follow-up. I honestly haven't had time to look at things lately but I would think that the version of PADD shouldn't be part of any version file since it's actually in the PADD script itself.

dschaper avatar Feb 08 '21 15:02 dschaper

Alright, so the versioning is fixed... how about the initial topic, "performance" (high CPU load)?

I was able to reproduce on a freshly installed & updated Raspbian + pihole. Both, on RPi Zero W and RPi 3B+, the issue of high CPU load occurred.

If padd is really going to be the replacement for "pihole -c" it would need to get fixed first... no?

rezzorix avatar Feb 08 '21 17:02 rezzorix

If padd is really going to be the replacement for "pihole -c" it would need to get fixed first... no?

Sure, sounds logical.

dschaper avatar Feb 08 '21 17:02 dschaper

A couple of notes:

  1. NormalPADD's for() loop displays the information (retrieved last time), then retrieves information, then sleeps 5 seconds. So the display of information (like "uptime" results) is always at least 5 seconds stale. (A simple fix is to move the "sleep 5" above the calls to GetVersionInformation etc.)

  2. There is a lot of information that perhaps should be pulled only once (not every N seconds): 2a. NetworkInformation: ip address, hostname, perhaps gateway; DHCP status; DNSSEC status 2b. should we read setupVars.conf every time? I guess pihole configuration could change while PADD is running 2c. VersionInformation: As Dan points out, PADD version can't change (though latest version can). Note that latest version polls github every time, thus every N seconds -- perhaps this should only be once/day.

  3. "CPU Load" is just the uptime 1 minute average * 100%? I'm no Linux guru, but I think uptime (/proc/loadavg) reports the # of runnable processes, which isn't really bounded. for a 0-100% display, like what htop shows at top, a different measurement is needed (like what chronometer.sh does: ps -eo pcpu,rss and sum up the first column values).

HansTallis avatar Feb 08 '21 17:02 HansTallis

Regarding performance: I've been trying to track this down a bit by "commenting out" various combinations of the five "GetXXXInformation()" functions. (Each is called from several places. To disable them, I add a "return" at top of function).

The loads below are from a separate term running uptime in a loop. I would make a change, then wait for load to hit a minimum over the next few minutes.

Not running PADD, running 'uptime' on a 5 second loop produces a load average of 0.01-0.05.

If I disable all GetXXXInformation calls and launch PADD, load is .04. This is roughly the time SizeChecker and Print*Information with no live information: essentially nil.

If I enable "GetSummaryInformation" (the sine qua non of padd), load is 0.3. Then enable "GetPiholeInformation" -> load is 0.4. Then enable "GetNetworkInformation" -> load is 0.45. Then enable "GetSystemInformation" -> load is 0.45. Then enable "GetVersionInformation" -> load is 0.45.

Then disable "GetSummaryInformation" (only the other 4 of 5 are enabled), load is 0.3.

HansTallis avatar Feb 08 '21 17:02 HansTallis

(sorry for the flood of posts..) Per the above note about CPU usage vs load: though the load can, at times, be several processes (which padd would display as CPU% > 100 for a single core CPU like the Zero), even while this is happening the actual CPU % can be under 50% -- confirmed with htop.

I also notice that during these "load storms" that may rise and fall, a few php-cgi processes may be active. Not sure if these are because the web interface is active, but if you're checking performance I'd kill any clients of that interface while you're testing.

HansTallis avatar Feb 09 '21 02:02 HansTallis

I'd kill any clients of that interface while you're testing

I doubt that's a good idea. You may be running a long-term data query which takes several (dozens of) seconds to complete. You shouldn't kill any other processes just to get some displayed (and otherwise entirely irrelevant) numbers.

The values returned by uptime are the system load averages for the past 1, 5, and 15 minutes. The percentage shown by htop, etc. are momentary samples which miss a lot of activity. They are especially bad at measuring very shot-lived (< 0.2 sec) processes. uptime (or the header of w) is always right as its values are supplied by the kernel itself which is responsible for the process management and knows about them all.

DL6ER avatar Feb 09 '21 04:02 DL6ER

Thank you DL6ER. And forgive my ignorance! I was trying to suggest that, while running performance tests, it's good to minimize any activities that could drive activity on the pi you're doing perf testing on. e.g. I had, on a different computer, a browser open to my pi's Pihole web interface, which was obviously demanding regular refreshes. I guessed this was causing the php-cgi processes to regularly consume CPU on my pi. so I meant to kill any pihole web interface clients.

HansTallis avatar Feb 10 '21 01:02 HansTallis

If you want a really meaningful measurement of CPU load, I'd suggest using the 10min averaged load and divide it by the number of available CPU cores. This will not be affected (much) by short-lived processes but will reveal if there is anything systematic you should look at.

DL6ER avatar Feb 11 '21 12:02 DL6ER

Just to summarize issues on this thread..

  1. What's the best way to measure load? Roughly, # of scheduleable processes vs % of time CPU is non-idle. (these are distantly related)
  2. Why does PADD generate so much load?
  3. How should PADD display load?
  4. Other misc improvements to PADD.

I'll defer on #1. But simply point out that these two measures are different, and only modestly related. And while # of ready processes is unbounded, non-idle fraction of CPU time should be bounded at 100%.

#2: I did some testing above (across the 5 GetXXXInformation routines) .. will let others opine.

#3: this is definitely an issue with PADD. It displays load twice: first, the raw load; the second time, load / # cores * 100%. The second measure is redundant. But also misleading, as it's displayed on a meter that maxes out at 100%. I'd suggest removing this meter.

#4: See the first two in "a couple of notes" posting above.

HansTallis avatar Feb 11 '21 14:02 HansTallis

This issue is stale because it has been open 30 days with no activity. Please comment or update this issue or it will be closed in 5 days.

github-actions[bot] avatar Dec 12 '22 08:12 github-actions[bot]

Did this ever go anywhere?

I've recently resurrected my old Raspberry Pi B Rev 2, but almost as soon as I start PADD the load starts increasing, and after around 10 minutes it's pegged.

I'll try bumping up the sleep value, but I suspect that won't be enough.

BeingTomGreen avatar Jan 02 '24 22:01 BeingTomGreen

Tom, I made several changes to the 2021 PADD script to improve a few things, including reducing what seemed to be excessive polling. I don't know if any of these were picked up by the PADD author.
I am running padd v3.11.0 on a RPi 3 with load < 0.2 steadily.

HansTallis avatar Jan 26 '24 15:01 HansTallis