File descriptor leak with bind-dnssec-db when sqlite database is not writable
- [x] This is not a support question, I have read about opensource and will send support questions to the IRC channel, Github Discussions or the mailing list.
- [x] I have read and understood the 'out in the open' support policy
- Program: Authoritative
- Issue type: Bug report
Short description
pdns_server seems to leak file descriptors to the sqlite3 database. I can reliably reproduce this happening once every zone-cache-refresh-interval. By leak I mean it opens the database again and does not close previous iterations of the database.
Environment
- Operating system: Debian 11
- Software version: pdns 4.8.3
- Software source: Custom compiled
Steps to reproduce
- Launch powerdns via systemd with this config:
launch=bind
bind-dnssec-db=/var/pdns/dbs/pdns-metadns-lua.sqlite3
local-address=127.0.0.2
loglevel=3
zone-cache-refresh-interval=7
The unit file I use is:
[Unit]
#%I = unescaped %i = escaped
Description=PowerDNS Authoritative Server - %I
Documentation=man:pdns_server(1) man:pdns_control(1)
Documentation=https://doc.powerdns.com
Wants=network-online.target
After=network-online.target mysql.service mysqld.service postgresql.service slapd.service mariadb.service time-sync.target
[Service]
ExecStart=/usr/local/sbin/pdns_server --guardian=no --daemon=no --disable-syslog --log-timestamp=no --write-pid=no --config-dir=/etc/powerdns/ --config-name=%i
SyslogIdentifier=pdns_server_%i
User=pdns
Group=pdns
Type=simple
Restart=always
RestartSec=1
StartLimitInterval=0
RuntimeDirectory=pdns
RuntimeDirectoryPreserve=yes
# Sandboxing
CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_CHOWN
AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_CHOWN
LockPersonality=true
NoNewPrivileges=true
PrivateDevices=true
PrivateTmp=true
# Setting PrivateUsers=true prevents us from opening our sockets
ProtectClock=true
ProtectControlGroups=true
ProtectHome=false
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
# ProtectSystem=full will disallow write access to /etc and /usr, possibly
# not being able to write slaved-zones into sqlite3 or zonefiles.
ProtectSystem=full
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
RestrictNamespaces=true
RestrictRealtime=true
RestrictSUIDSGID=true
SystemCallArchitectures=native
SystemCallFilter=~ @clock @debug @module @mount @raw-io @reboot @swap @cpu-emulation @obsolete
ProtectProc=invisible
PrivateIPC=true
RemoveIPC=true
DevicePolicy=closed
# Not enabled by default because it does not play well with LuaJIT
# MemoryDenyWriteExecute=true
[Install]
WantedBy=multi-user.target
And then systemctl start pdns@test
2. Watch the number of open files with while [ 1==1 ]; do lsof -p pgrep -f config-name=test | wc -l; sleep 8; done
Expected behaviour
Previous sqlite3 instances would be closed after each zone-cache-refresh-interval
Actual behaviour
File descriptors remain open
Other information
I can only reproduce this when running under systemd - if run manually it doesn't seem to happen.
Hi! Thank you for reporting this!
Just to be sure, if you look at the actual new lines in lsof, not just the number of lines, you do see file descriptors to the SQLite file, right?
Anything in the PowerDNS logs?
Would you mind sharing the content of /etc/powerdns/metadns/bind.conf as well?
Just to be sure, if you look at the actual new lines in lsof, not just the number of lines, you do see file descriptors to the SQLite file, right?
Correct, a new file descriptor appears every 7 seconds:
pdns_serv 169689 pdns 34rr REG 254,0 28672 35665477 /var/pdns/dbs/pdns-metadns-lua.sqlite3
Anything in the PowerDNS logs?
No, even with loglevel=10 nothing gets displayed
Would you mind sharing the content of /etc/powerdns/metadns/bind.conf as well?
I've updated the initial post, this reproduces even without that line in the config
Interestingly, I do see pdns calling sqlite3_close if I look at bpftrace:
# bpftrace -p `pgrep -f config-name=test` -e 'uprobe:/usr/local/lib/pdns/libsqlite3.so.0:sqlite3_open{ time("%F %T"); print("in sqlite3_open\n"); } uprobe:/usr/local/lib/pdns/libsqlite3.so.0:sqlite3_close{ time("%F %T"); print("in sqlite3_close\n"); }'
Attaching 2 probes...
2024-03-19 15:04:40in sqlite3_open
2024-03-19 15:04:40in sqlite3_close
2024-03-19 15:04:47in sqlite3_open
2024-03-19 15:04:47in sqlite3_close
2024-03-19 15:04:54in sqlite3_open
2024-03-19 15:04:54in sqlite3_close
Maybe the code should use sqlite3_close_v2 instead
https://github.com/sqlite/sqlite/blob/1fe31dcfabf886517e41cbab3b8435e0e828b44f/src/main.c#L1346-L1347 https://github.com/sqlite/sqlite/blob/1fe31dcfabf886517e41cbab3b8435e0e828b44f/src/main.c#L1275-L1283
The sqlite3_close_v2() interface is intended for use with host languages that are garbage collected, and where the order in which destructors are called is arbitrary.
I'm completely at a loss as to what it means.
~wrong tree~
According https://www.sqlite.org/c3ref/open.html you still need to close the DB if an error is returned on opening. The DB open code throws on error without closing.
But even if that would be the case (file exists but cannot be opened for some reason), you would expect something in the logs.
@devicenull can you show the output of ls -l /var/pdns/dbs/pdns-metadns-lua.sqlite3 ?
lrwxrwxrwx 1 root root 63 Mar 18 20:45 /var/pdns/dbs/pdns-metadns-lua.sqlite3 -> /root/roles/metadns/var/pdns/dbs/pdns-metadns-lua.sqlite3
The user of the service (in description) isn't root -- I presume that pdns can't access that file.
Some troubleshooting in IRC figured this out:
If pdns does not have write access to the sqlite3 files, it will leak file descriptors like this. Functionally, the server works fine otherwise.
So, you can reproduced this with chown root:root *sqlite3*; chmod 0644 *sqlite3* and start pdns as non-root.
The systemd bit was unrelated - in my tests pdns was running as root when I ran it manually, but as pdns when run via systemd
Okay, I dug deep into sqlite internals, and came up with this:
Full reproduction requirements:
- BIND DNSSEC database and/or sqlite backend
- sqlite database file not writable by pdns
- Respective journal mode set to WAL (default)
- WAL file already exists
- Something that creates and destroys backends, like
zone-cache-refresh-interval
Explanation:
- When sqlite opens a database with a WAL journal, it takes a shared lock on the database file for the life of the database connection, or until the journal mode is changed to not-WAL.
- Sqlite locks are implemented with
fcntl(..., F_SETLK, ...) fcntl(..., F_SETLK, ...)locks will be dropped when any file descriptor pointing to the file is closed in the process holding the lock, even if it wasn't the file descriptor used to acquire the lock.- Sqlite manages this by saving file descriptors pointing to databases on database closure if there are still locks outstanding. These saved file descriptors will be reused by a later open of the same database file if the open mode matches.
- If you attempt to open a database file read-write that you cannot write to, sqlite automatically falls back to opening it read-only, but does not attempt to reuse a saved file descriptor in the fallback.
- PDNS makes multiple persistent connections to the sqlite database as part of backend instances that are long lived. When using the default WAL journal mode, this ensures there will always be a shared lock on the database file outstanding.
- Features that create new backend instances and then later destroy them result in new connections to the database that are later closed when the backend instance is destroyed.
Possible fixes:
- Sqlite could fix the fallback read-only open to retry attempting to reuse a file descriptor
- PDNS could check whether the database can be opened read-write, and then tell sqlite to only open the database read-only if it can't. This will allow reuse of an existing read-only file descriptor.
- PDNS could add an option to explicitly indicate the database is expected to be read-only and check whether the database is read-write, and abort if there is a mismatch.
Sqlite could fix the fallback read-only open to retry attempting to reuse a file descriptor
That change was made just moments ago:
https://sqlite.org/src/info/a678e854 (reported via https://sqlite.org/forum/forumpost/4b2e1fd222)