FTL icon indicating copy to clipboard operation
FTL copied to clipboard

High CPU usage and bad browisng experience after upgrade to v6 from v5

Open lan17 opened this issue 9 months ago • 91 comments

Versions

  • Pi-hole: v6.0.5
  • AdminLTE:
  • FTL: v6.0.4

Platform

  • OS and version: Linux raspberrypi 6.6.20+rpt-rpi-v8 pi-hole/pi-hole#1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux
  • Platform: Raspberry Pi

Expected behavior

Works fast, doesn't consume much CPU

Actual behavior / bug

Slow, consumes a lot of CPU, internet browsing is noticeably slower. ~15-20% CPU consumption vs <1% before.

It seems to be due to pihole-FTL

Steps to reproduce

Upgrade from v5 to v6. Must be some performance bugs.

Debug Token

  • URL:

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

lan17 avatar Mar 07 '25 07:03 lan17

Please provide more information: when do you see high CPU usage? All the time? Only on HTTPS access to the web interface? Only during gravity run? Please provide a debug token.

yubiuser avatar Mar 07 '25 07:03 yubiuser

I experienced the same issue when I ran an upgrade. I was running PiHole exclusively on a Pi3. The WebUI was horribly slow and DNS stopped resolving, I had to actually remove it from my network and point my router to Cloudflare.

FatherOfCurses avatar Mar 07 '25 18:03 FatherOfCurses

Ich hatte überhaupt keine Probleme mit dem Update, trotz eines alte 3B+

admin edit - adding English:

I had no problems at all with the update, despite an old 3B+

Peter1979pl avatar Mar 07 '25 18:03 Peter1979pl

I am running PiHole on a Proxmox environment. The same problem here. Was running perfectly before (v5) with one core and 500mb RAM. Now (v6) it is struggling with 4 cores and 4GB RAM....

For me too it seems that pihole-FTL is the problem, as it has by far the most CPU time when checking on CLI. When checking in Browser under System i get the following information: 143.7% (load: 5.75 19.78 12.48) on 4 cores running 894 processes (0.2% used by FTL)

The PiHole Dashboard sometimes shows CPU loads of 600% and DNS Requests get stuck. Today PiHole did not respond for two minutes...

kevingufler avatar Mar 08 '25 03:03 kevingufler

I have the same issue

  • Raspberry pi model B rev 2 (yes one of the first ones)
  • CPU BCM2835 @ 800MHz and 429MiB of RAM
  • Raspbian lite (Raspbian GNU/Linux 11 (Bullseye) armv6l and kernel 6.1.21+

The previous V5 has had no issues at all for me. It was rock solid and now it's not working correctly anymore. Some of the issues are as following:

  • high cpu util
  • slow web UI
  • DNS resolver seems to be broken too.
    • Sometimes it works and most of the time it does not. Or run into the timeout limit.

It is dedicated to only run pihole.

Does anyone have a fix for this issue?

Edit: Finally got to the system info tab in the web UI.

CPU: 168.5% (load: 1.95 1.68 1.72) on 1 core running 122 processes (78.2% used by FTL)

FKdeveloper avatar Mar 08 '25 18:03 FKdeveloper

Same issue here, and also:

  • Query log seems to stop at a certain point in time
  • Refreshing the log does not have any effect
  • Activating "Query on-disk data" in Advanced filtering" seems to trigger the CPU increase: CPU: 14.2% (load: 0.57 0.16 0.05) on 4 cores running 162 processes (56.2% used by FTL) Without the query log on-disk access: CPU: 0.0% (load: 0.00 0.05 0.07) on 4 cores running 161 processes (1.2% used by FTL)
  • Raspbberry Pi 3 B Arm Cortex-A53, quad-core, 1 Gb RAM
  • Linux 6.1.31-v8.1.el8.altarch pi-hole/pi-hole#1 SMP PREEMPT Sat Jun 10 22:16:25 UTC 2023 aarch64

2-10kur avatar Mar 09 '25 07:03 2-10kur

In my case, after the latest update of FTL, it used 100% of the CPU, the DNS server was blocked, and the GUI was also blocked. After rebooting, it consumes all available RAM, and then my watchdog kills the process, which keeps repeating. Also, it taxes one of the cores to 100%. Core [v6.0.4], FTL [v6.0.3] and Web UI [v6.0.1] are working fine, though, and there were no issues before the update yesterday to the new version. I restored the machine to its original state, hopefully this gets resolved soon.

dusancoko avatar Mar 09 '25 08:03 dusancoko

Hi,

Facing the same issue (this machine is only used for pihole. Nothing else)

Core version is v6.0.5 (Latest: v6.0.5)
Web version is v6.0.2 (Latest: v6.0.2)
FTL version is v6.0.4 (Latest: v6.0.4)

Granted this is a pi zero, but version 5 handled this machine without any issues. Now I have high CPU usage (without the web UI open) and with some frequency DNS queries just fail

If this helps, this is htop output (and this is when the system is not actually that high. Load is only a little over 1, when it reaches 2 or 3 easily), it seems the database process takes quite a bit of CPU

Image

This is the size of the long term database, don't know if it's related.

$  du -k /etc/pihole/pihole-FTL.db
722496  /etc/pihole/pihole-FTL.db

Should I do something to reduce this size? I don't really care that much about 365 of queries which I believe is the default value

If i open the web UI i get this

Image

but htop doesn't really show CPU usage for the web workers

tspascoal avatar Mar 09 '25 20:03 tspascoal

My personal workaround was to stop pihole-FTL, rename the database file and restart the pihole-FTL service. At least for the time being, the query log is seen correctly and the CPU usage decreased back to 1-2%

2-10kur avatar Mar 09 '25 21:03 2-10kur

Deleting the database file did the trick.... CPU is now a lot more saner and no pihole failures so far

Image

tspascoal avatar Mar 09 '25 22:03 tspascoal

So I can confirm the following:

My Raspi configuration has not changed in the last few months. The last OS update was on February 26th - but the problems have only been occurring since the update to v6. None problems 4 years before. Today is the second time I've had this problem, which prompted me to go searching and I found this issue.

And renaming the DB file and restarting the service also helped me. (Thanks for the tip)

Image

NordFreak avatar Mar 10 '25 23:03 NordFreak

Just adding my voice to this issue. Like @FKdeveloper I've been running pihole on a rev 1.3 Model B for years. Renaming the database file works for a short time but after a couple of days of use (on a large home network with maybe two dozen clients), Load Average on htop is significantly above 1.0 across all three measures, DNS queries start to time out and the web interface becomes extraordinarily sluggish. This is a setup that ran up to the V5.x releases with no performance issues whatsoever.

voxdumnonia avatar Mar 12 '25 15:03 voxdumnonia

Just adding my voice to this issue. Like @FKdeveloper I've been running pihole on a rev 1.3 Model B for years. Renaming the database file works for a short time but after a couple of days of use (on a large home network with maybe two dozen clients), Load Average on htop is significantly above 1.0 across all three measures, DNS queries start to time out and the web interface becomes extraordinarily sluggish. This is a setup than ran up to the V5.x releases with no performance issues whatsoever.

Addendum: I don't think this should be happening - 10.02 1m load average with an uptime of under four hours! And what's it doing with all that swap space?!

Image

voxdumnonia avatar Mar 12 '25 19:03 voxdumnonia

Ah, forgot to mention, in my case I also did a swapoff -a - I do not see why it would need swap at all. Mine has been stable since I wrote the comment

2-10kur avatar Mar 12 '25 19:03 2-10kur

Thanks to the friends above for your help. Since upgrading to V6, I've frequently encountered issues where DNS fails to resolve any websites. I'm running Pi-hole on Proxmox VE. Following your advice, I deleted the database, and I haven’t experienced any domain resolution issues since. THX again!!!! Big THX!

marcuccilli avatar Mar 16 '25 19:03 marcuccilli

i know that's not helping anyone... but from my Side migration from v5 to v6 went without issues

Base System Debian 12.x inside VirtualBox on FreeBSD 13.x

new v6 is fast as hell

lux73 avatar Mar 22 '25 07:03 lux73

I am running PiHole on a Proxmox environment. The same problem here. Was running perfectly before (v5) with one core and 500mb RAM. Now (v6) it is struggling with 4 cores and 4GB RAM....

For me too it seems that pihole-FTL is the problem, as it has by far the most CPU time when checking on CLI. When checking in Browser under System i get the following information: 143.7% (load: 5.75 19.78 12.48) on 4 cores running 894 processes (0.2% used by FTL)

The PiHole Dashboard sometimes shows CPU loads of 600% and DNS Requests get stuck. Today PiHole did not respond for two minutes...

Same setup with Proxmox that I have had since 2020 and have been updating regularly without issues. My database is big--about 2.5GB--but that has never been an issue. With the upgrade to 6, it chokes.

a-meet avatar Mar 22 '25 21:03 a-meet

Having the same issues about once every 24 hours or so where the process "civetweb-master" starts using a single thread to 600% until the ftl database is removed and ftl restarted OR if I just reboot the container. My instance is a proxmox LXC created from the proxmox helper scripts. I created this one from scratch at version 6.0.4 and have had problems since install. There has been some more documentation of this problem on the PiHole forums: https://discourse.pi-hole.net/t/pihole-unresponsive-after-update-to-6-0-extreme-high-cpu-usage/77193

dcwestra avatar Mar 26 '25 02:03 dcwestra

Just adding my voice to this issue. Like @FKdeveloper I've been running pihole on a rev 1.3 Model B for years. Renaming the database file works for a short time but after a couple of days of use (on a large home network with maybe two dozen clients), Load Average on htop is significantly above 1.0 across all three measures, DNS queries start to time out and the web interface becomes extraordinarily sluggish. This is a setup than ran up to the V5.x releases with no performance issues whatsoever.

Addendum: I don't think this should be happening - 10.02 1m load average with an uptime of under four hours! And what's it doing with all that swap space?!

Image

Since I wrote this, I've managed to get PiHole into a state where recording is broken, but it's still resolving DNS queries and serving DHCP (and thus "working"). In the FTL logs on bootup I get the following:

ERROR SQLite3: no such table: ftl in "SELECT VALUE FROM ftl WHERE id = 0;" (1)
2025-03-22 11:53:17.562 ERROR Encountered prepare error in db_query_int("SELECT VALUE FROM ftl WHERE id = 0;"): SQL logic error
2025-03-22 11:53:17.563 WARNING Database not available, please ensure the database is unlocked when starting pihole-FTL !
2025-03-22 11:53:17.625 ERROR init_memory_database(): Failed to attach disk database

However, my CPU usage is on the floor, and there's no swap usage to speak of. It seems clear that there's a pretty horrific problem with the database link in all version 6 that's causing this, since removing the database from usage entirely (as I've accidentally done) has fixed the issue for a couple of weeks now. I'm going to hazard a guess that database writes are happening too slowly on some systems and thus memory (and then swap space) are filling up, while timeouts are causing the write processes to repeat, taking more and more CPU time as more writes get added to the queue. At a guess, people running in containers aren't finding this, as the writes occur at a speed that doesn't hold things up.

voxdumnonia avatar Mar 26 '25 10:03 voxdumnonia

I may also be experiencing this issue. Pihole on a Raspberry Pi Zero 2 W. In the past it has always been very responsive, both the web UI as well as DNS resolution. Since updating I've been experiencing rather regular times where DNS resolution times out and browsers hitting timeouts. This happens without accessing the web UI at all.

I keep trying to quickly SSH to the Pihole when it happens and I often see high CPU load of pihole-FTL as well as over 100 MB of swap used. Before the update, the system never swapped more than ~5MB or so. I reduced swappiness to 1 but the system keeps swapping despite memory being available and hangs keep happening. Something has definitely changed since the update.

Unfortunately I don't have any new info to add at this time, but wanted to add to the fact that it doesn't seem to only affect some isolated cases.

fshimizu avatar Mar 26 '25 16:03 fshimizu

I believe I am also experiencing this issue, though it's not on a Raspberry Pi, as I have PiHole installed on Unbuntu Server on a tiny form factor HP EliteDesk 705.

I rebooted to see if it would improve the UI experience, it did not. The WebUI is still quite slow and laggy, and when this happens, TOP shows high CPU usage by the pihole-FTL process. This is a new problem, the system was a rock before v6, though this problem really became apparent recently.

Image

This screenshot may be more revealing:

Image

christopheradlam avatar Mar 28 '25 12:03 christopheradlam

I've restored a backup OS image with Pihole 5 some 20 hours ago and it's been a big difference. The general CPU load and RAM usage on the system are now much lower than before. Swap remains at single digits whereas with 6 it would swap to >100MB minutes after boot. The web UI is consistently much more snappy again and I haven't noticed any hangs or other outliers. Most importantly I haven't had any times of slow or timing out DNS resolution on clients any more, so far.

Last version I had issues with, before the downgrade: Core v6.0.5 FTL v6.0.4 Web interface v6.0.2

Old version I downgraded to: Pi-hole v5.18.4 FTL v5.25.2 Web Interface v5.21

Edit: I still have the other SD card with the newer image with Pihole 6 on it. So I can boot to that if I can help provide some information that may help investigation.

fshimizu avatar Mar 28 '25 16:03 fshimizu

I'm also experiencing on my Proxmox Debian (fully updated) LXC this problem with the newest pihole version. On my Proxmox host I could observe a high io rate since today 00:00: Image

I saw the high read rate coming from the process pihole-FTL -f [database] while the DNS lookups and also the pihole dashboard were freezed for ca. 20s. This process was repeatedly running in some intervals (didn't measure time exactly).

The log entries lead to the conclusion that the FTL database is not okay (ERROR: Error while trying to close database: database is locked, ERROR SQLite3: recovered frames from WAL file /etc/pihole/pihole-FTL.db-wal), despite pihole-FTL d test is telling me that everything is alright regarding the FTL databases.

Even if I restore backups before one week (same pihole versions) the same issue happens again instantly, it was fine before today (at least no 20s DNS timeouts). Also pihole -r wasn't helping. I just deleted the current pihole-FTL.db and it starts working without issues for now.

Let me know if my logfiles or further information would be helpful. I even have older backups (back to 9th of March) if someone wants to verify with the last pihole versions.

fxwgr avatar Apr 27 '25 17:04 fxwgr

I've been experiencing this problem as well lately. DNS resolution eventually starts going very slow and eventually stops entirely. I'm running the pihole in a proxmox lxc. The entire container becomes unresponsive. I've been able to stop the container and access it quickly upon restart to disable FTL and recover the server:

pct start 106 && pct enter 106 then: systemctl stop pihole-FTL

For now, I've disabled pihole and set unbound to listen on port 53. DNS is VERY snappy now.

Yesterday, I had success for a while after renaming the FTL db and then restarting pihole. But that didn't last. I created a whole new unbound + pihole server from scratch and installed keepalived in hopes of having a warm standby. But the primary server started experiencing the same thing today!

I'm seeing a number of errors in the log. Several relate to the database being locked when trying to delete. The system ends up rate-limiting my router that forwards DNS requests and the load average of the system goes through the roof.

Also, this error:

508460/T508693] INFO: Compiled 1 allow and 5 deny regex for 60 clients in 62.7 msec
2025-04-27 17:40:34.053 PDT [508722/F508460] ERROR: Error when obtaining outer SHM lock: Previous owner died
2025-04-27 17:40:51.608 PDT [508722/F508460] ERROR: Error when obtaining inner SHM lock: Previous owner died

And I saw over a thousand gravity update errors:

FTL.log.1:2025-04-26 07:54:37.029 PDT [12049/T12089] ERROR: gravity_updated(): SELECT value FROM info WHERE property = 'updated'; - SQL error step: no more rows available

The DB has these stats when broken:

2025-04-27 17:40:26.099 PDT [508460M] INFO: Imported 1771350 queries from the long-term database
2025-04-27 17:40:26.100 PDT [508460M] INFO:  -> Total DNS queries: 1771350
2025-04-27 17:40:26.100 PDT [508460M] INFO:  -> Cached DNS queries: 99709
2025-04-27 17:40:26.100 PDT [508460M] INFO:  -> Forwarded DNS queries: 1662048
2025-04-27 17:40:26.100 PDT [508460M] INFO:  -> Blocked DNS queries: 8360
2025-04-27 17:40:26.100 PDT [508460M] INFO:  -> Unknown DNS queries: 5
2025-04-27 17:40:26.100 PDT [508460M] INFO:  -> Unique domains: 3273
2025-04-27 17:40:26.100 PDT [508460M] INFO:  -> Unique clients: 60
2025-04-27 17:40:26.100 PDT [508460M] INFO:  -> DNS cache records: 712
2025-04-27 17:40:26.100 PDT [508460M] INFO:  -> Known forward destinations: 2

Debug log link: https://tricorder.pi-hole.net/SWxURWPk/

jaxley avatar Apr 28 '25 01:04 jaxley

How do you have the LXC configured? Can you provide a copy of the node.conf file?

root@pve-silver:/etc/pve/lxc# cat 107.conf
arch: amd64
cmode: console
cores: 1
features: nesting=1
hostname: pihole
memory: 1024
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.1.1,hwaddr=BC:24:11:82:1C:DB,ip=192.168.1.15/24,ip6=auto,type=veth
onboot: 1
ostype: debian
rootfs: local-zfs:subvol-107-disk-0,mountoptions=discard,size=50G
swap: 1024
tags: 192.168.1.15
unprivileged: 1

dschaper avatar Apr 28 '25 02:04 dschaper

arch: amd64
cores: 1
features: keyctl=1,nesting=1
hostname: unbound
memory: 512
net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:D8:B2:C3,ip=dhcp,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-106-disk-0,size=2G
startup: order=1
swap: 512
tags: community-script;dns
unprivileged: 1
lxc.prlimit.memlock: unlimited

jaxley avatar Apr 28 '25 02:04 jaxley

This is mine, in case it helps - I too was running v5 for well over a year without any issues like this:

arch: amd64
cores: 4
features: nesting=1
hostname: RALPH
memory: 1024
nameserver: 192.168.1.1
net0: name=eth0,bridge=vmbr0,gw=192.168.1.1,hwaddr=BC:....:75,ip=192.168.1.9/24,type=veth
onboot: 1
ostype: debian
rootfs: local-zfs:subvol-102-disk-0,size=16G
startup: order=1
swap: 1024
tags: container;linux
timezone: Europe/Luxembourg
unprivileged: 1

Proxmox v8.4.1

Taomyn avatar Apr 28 '25 11:04 Taomyn

@jaxley That root disk is very small at 2G. Can you post the df for the container? If you have insufficient space and SHM then things will not function properly.

root@pihole:~# df -h
Filesystem                    Size  Used Avail Use% Mounted on
rpool/data/subvol-107-disk-0   50G  557M   50G   2% /
none                          492K  4.0K  488K   1% /dev
udev                          7.7G     0  7.7G   0% /dev/tty
tmpfs                         7.8G  9.4M  7.7G   1% /dev/shm
tmpfs                         3.1G   96K  3.1G   1% /run
tmpfs                         5.0M     0  5.0M   0% /run/lock
tmpfs                         1.6G     0  1.6G   0% /run/user/0

And @Taomyn what kind of drive do you have for your zfs storage? Is it SSD or a spinner? If it's SSD then try setting the mount option for discard.

dschaper avatar Apr 28 '25 13:04 dschaper

@dschaper it's an NVMe drive - I hadn't seen that option before and there's nothing about it I can find on the Proxmox site, but I've enabled it and restarted the container. As it can be days/weeks before it happens again, I'm not sure if I can test anything else. We'll just have to see.

Taomyn avatar Apr 28 '25 13:04 Taomyn

/dev/mapper/pve-vm--106--disk--0   1992552 1390212    481100  75% /
none                                   492       4       488   1% /dev
udev                               3920956       0   3920956   0% /dev/tty
tmpfs                              3954924       0   3954924   0% /dev/shm
tmpfs                              1581972      96   1581876   1% /run
tmpfs                                 5120       0      5120   0% /run/lock

The new system has a smaller root disk than the original one (2G vs 8G). But both were showing the same symptoms. Here's the old one:

Filesystem                       1K-blocks    Used Available Use% Mounted on
/dev/mapper/pve-vm--101--disk--0   8154588 4857680   2861096  63% /var/lib/lxc/101/rootfs

Original system config:

# cat /etc/pve/lxc/101.conf 
arch: amd64
cores: 1
features: nesting=1
hostname: pihole-new
memory: 512
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=BC:24:11:AD:7A:5A,ip=dhcp,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-101-disk-0,size=8G
startup: order=2
swap: 512
tags: dns
unprivileged: 1
lxc.cgroup2.devices.allow: c 10:200 rwm
lxc.mount.entry: /dev/net/tun dev/net/tun none bind,create=file
lxc.cgroup2.devices.allow: c 10:200 rwm
lxc.mount.entry: /dev/net/tun dev/net/tun none bind,create=file

jaxley avatar Apr 28 '25 13:04 jaxley