pdns icon indicating copy to clipboard operation
pdns copied to clipboard

File descriptor leak with bind-dnssec-db when sqlite database is not writable

Open devicenull opened this issue 1 year ago • 13 comments

  • Program: Authoritative
  • Issue type: Bug report

Short description

pdns_server seems to leak file descriptors to the sqlite3 database. I can reliably reproduce this happening once every zone-cache-refresh-interval. By leak I mean it opens the database again and does not close previous iterations of the database.

Environment

  • Operating system: Debian 11
  • Software version: pdns 4.8.3
  • Software source: Custom compiled

Steps to reproduce

  1. Launch powerdns via systemd with this config:
launch=bind
bind-dnssec-db=/var/pdns/dbs/pdns-metadns-lua.sqlite3
local-address=127.0.0.2
loglevel=3
zone-cache-refresh-interval=7

The unit file I use is:

[Unit]
#%I = unescaped %i = escaped
Description=PowerDNS Authoritative Server - %I
Documentation=man:pdns_server(1) man:pdns_control(1)
Documentation=https://doc.powerdns.com
Wants=network-online.target
After=network-online.target mysql.service mysqld.service postgresql.service slapd.service mariadb.service time-sync.target

[Service]
ExecStart=/usr/local/sbin/pdns_server --guardian=no --daemon=no --disable-syslog --log-timestamp=no --write-pid=no --config-dir=/etc/powerdns/ --config-name=%i
SyslogIdentifier=pdns_server_%i
User=pdns
Group=pdns
Type=simple
Restart=always
RestartSec=1
StartLimitInterval=0
RuntimeDirectory=pdns
RuntimeDirectoryPreserve=yes

# Sandboxing
CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_CHOWN
AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_CHOWN
LockPersonality=true
NoNewPrivileges=true
PrivateDevices=true
PrivateTmp=true
# Setting PrivateUsers=true prevents us from opening our sockets
ProtectClock=true
ProtectControlGroups=true
ProtectHome=false
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
# ProtectSystem=full will disallow write access to /etc and /usr, possibly
# not being able to write slaved-zones into sqlite3 or zonefiles.
ProtectSystem=full
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
RestrictNamespaces=true
RestrictRealtime=true
RestrictSUIDSGID=true
SystemCallArchitectures=native
SystemCallFilter=~ @clock @debug @module @mount @raw-io @reboot @swap @cpu-emulation @obsolete
ProtectProc=invisible
PrivateIPC=true
RemoveIPC=true
DevicePolicy=closed
# Not enabled by default because it does not play well with LuaJIT
# MemoryDenyWriteExecute=true

[Install]
WantedBy=multi-user.target

And then systemctl start pdns@test 2. Watch the number of open files with while [ 1==1 ]; do lsof -p pgrep -f config-name=test | wc -l; sleep 8; done

Expected behaviour

Previous sqlite3 instances would be closed after each zone-cache-refresh-interval

Actual behaviour

File descriptors remain open

Other information

I can only reproduce this when running under systemd - if run manually it doesn't seem to happen.

devicenull avatar Mar 19 '24 14:03 devicenull

Hi! Thank you for reporting this! Just to be sure, if you look at the actual new lines in lsof, not just the number of lines, you do see file descriptors to the SQLite file, right? Anything in the PowerDNS logs? Would you mind sharing the content of /etc/powerdns/metadns/bind.conf as well?

rgacogne avatar Mar 19 '24 14:03 rgacogne

Just to be sure, if you look at the actual new lines in lsof, not just the number of lines, you do see file descriptors to the SQLite file, right?

Correct, a new file descriptor appears every 7 seconds:

pdns_serv 169689 pdns   34rr  REG              254,0     28672   35665477 /var/pdns/dbs/pdns-metadns-lua.sqlite3

Anything in the PowerDNS logs?

No, even with loglevel=10 nothing gets displayed

Would you mind sharing the content of /etc/powerdns/metadns/bind.conf as well?

I've updated the initial post, this reproduces even without that line in the config

devicenull avatar Mar 19 '24 15:03 devicenull

Interestingly, I do see pdns calling sqlite3_close if I look at bpftrace:

# bpftrace -p `pgrep -f config-name=test` -e 'uprobe:/usr/local/lib/pdns/libsqlite3.so.0:sqlite3_open{ time("%F %T"); print("in sqlite3_open\n");  } uprobe:/usr/local/lib/pdns/libsqlite3.so.0:sqlite3_close{ time("%F %T"); print("in sqlite3_close\n");  }'
Attaching 2 probes...
2024-03-19 15:04:40in sqlite3_open

2024-03-19 15:04:40in sqlite3_close

2024-03-19 15:04:47in sqlite3_open

2024-03-19 15:04:47in sqlite3_close

2024-03-19 15:04:54in sqlite3_open

2024-03-19 15:04:54in sqlite3_close

devicenull avatar Mar 19 '24 15:03 devicenull

Maybe the code should use sqlite3_close_v2 instead

https://github.com/sqlite/sqlite/blob/1fe31dcfabf886517e41cbab3b8435e0e828b44f/src/main.c#L1346-L1347 https://github.com/sqlite/sqlite/blob/1fe31dcfabf886517e41cbab3b8435e0e828b44f/src/main.c#L1275-L1283

jsoref avatar Mar 19 '24 15:03 jsoref

The sqlite3_close_v2() interface is intended for use with host languages that are garbage collected, and where the order in which destructors are called is arbitrary.

I'm completely at a loss as to what it means.

rgacogne avatar Mar 19 '24 15:03 rgacogne

~wrong tree~

jsoref avatar Mar 19 '24 15:03 jsoref

According https://www.sqlite.org/c3ref/open.html you still need to close the DB if an error is returned on opening. The DB open code throws on error without closing.

But even if that would be the case (file exists but cannot be opened for some reason), you would expect something in the logs.

omoerbeek avatar Mar 19 '24 17:03 omoerbeek

@devicenull can you show the output of ls -l /var/pdns/dbs/pdns-metadns-lua.sqlite3 ?

omoerbeek avatar Mar 19 '24 17:03 omoerbeek

lrwxrwxrwx 1 root root 63 Mar 18 20:45 /var/pdns/dbs/pdns-metadns-lua.sqlite3 -> /root/roles/metadns/var/pdns/dbs/pdns-metadns-lua.sqlite3

devicenull avatar Mar 19 '24 17:03 devicenull

The user of the service (in description) isn't root -- I presume that pdns can't access that file.

jsoref avatar Mar 19 '24 17:03 jsoref

Some troubleshooting in IRC figured this out:

If pdns does not have write access to the sqlite3 files, it will leak file descriptors like this. Functionally, the server works fine otherwise.

So, you can reproduced this with chown root:root *sqlite3*; chmod 0644 *sqlite3* and start pdns as non-root.

The systemd bit was unrelated - in my tests pdns was running as root when I ran it manually, but as pdns when run via systemd

devicenull avatar Mar 19 '24 17:03 devicenull

Okay, I dug deep into sqlite internals, and came up with this:

Full reproduction requirements:

  1. BIND DNSSEC database and/or sqlite backend
  2. sqlite database file not writable by pdns
  3. Respective journal mode set to WAL (default)
  4. WAL file already exists
  5. Something that creates and destroys backends, like zone-cache-refresh-interval

Explanation:

  1. When sqlite opens a database with a WAL journal, it takes a shared lock on the database file for the life of the database connection, or until the journal mode is changed to not-WAL.
  2. Sqlite locks are implemented with fcntl(..., F_SETLK, ...)
  3. fcntl(..., F_SETLK, ...) locks will be dropped when any file descriptor pointing to the file is closed in the process holding the lock, even if it wasn't the file descriptor used to acquire the lock.
  4. Sqlite manages this by saving file descriptors pointing to databases on database closure if there are still locks outstanding. These saved file descriptors will be reused by a later open of the same database file if the open mode matches.
  5. If you attempt to open a database file read-write that you cannot write to, sqlite automatically falls back to opening it read-only, but does not attempt to reuse a saved file descriptor in the fallback.
  6. PDNS makes multiple persistent connections to the sqlite database as part of backend instances that are long lived. When using the default WAL journal mode, this ensures there will always be a shared lock on the database file outstanding.
  7. Features that create new backend instances and then later destroy them result in new connections to the database that are later closed when the backend instance is destroyed.

Possible fixes:

  1. Sqlite could fix the fallback read-only open to retry attempting to reuse a file descriptor
  2. PDNS could check whether the database can be opened read-write, and then tell sqlite to only open the database read-only if it can't. This will allow reuse of an existing read-only file descriptor.
  3. PDNS could add an option to explicitly indicate the database is expected to be read-only and check whether the database is read-write, and abort if there is a mismatch.

dwfreed avatar Mar 19 '24 22:03 dwfreed

Sqlite could fix the fallback read-only open to retry attempting to reuse a file descriptor

That change was made just moments ago:

https://sqlite.org/src/info/a678e854 (reported via https://sqlite.org/forum/forumpost/4b2e1fd222)

sgbeal avatar Mar 21 '24 11:03 sgbeal