pdns icon indicating copy to clipboard operation
pdns copied to clipboard

dnsdist: newBPFFilter does not work on recent Linux kernel

Open paddg opened this issue 2 years ago • 17 comments

I know this is rather a support question. @omoerbeek would like us to discuss it here later with @rgacogne

  • Program: dnsdist 1.7.2
  • Issue type: Error report

Short description

After upgrading the OS from SLES15-SP3 to SLES15-SP4 and Linux kernel 5.3 to 5.14, the newBPFFilter() function in dnsdist does not work anymore.

If I start dnsdist via systemd, it fails with this error message in the journal: Fatal Lua error: [string "chunk"]:294: Caught exception: Error creating a BPF map of size 1024: Operation not permitted

Environment

  • Operating system: SLES15-SP4, Linux 5.14
  • Software version: dnsdist 1.7.2
  • Software source: self compiled

Steps to reproduce

This is the function call that doesn't work:

bpf = newBPFFilter(1024, 1024, 1024)

Other information

We already discussed this issue on #powerdns. Here the transcript:

<winfried> Hi, in dnsdist 1.7.0 I have this Statement: bpf = newBPFFilter(1024, 1024, 1024)
<winfried> It does not work anymore in 1.7.2: Caught exception: Error creating a BPF map of size 1024: Operation not permitted
<Habbie> your syntax is correct
<Habbie> the problem lies deeper
<Habbie> probably, indeed, permissions/capabilities
<Habbie> also the formatting of that bit of docs is terrible
<ottom> yes, the notes about  net.core.optmem_max and RLIMIT_MEMLOCK and the capabilities should be grouped
<ottom> there is a PR to handle RLIMIT_MEMLOCK automatically, btw
<ottom> winfried: are you running with systemd? in that case, please show your unit file
<winfried> ottom: https://p.6core.net/p/WSBtmRcVAYaBzK47GOYZvlyd
<winfried> ottom: Linux  5.14.21
<ottom> winfried: is this error reported on startup or later when changing config?
<winfried> ottom: this is on startup
<ottom> i thik you are at least missing capabilities, but i have no clue yet why this does not show on 1.7.0 for you
winfried> ottom: because I have a new Linunx kernel on that box I guess
<ottom> ah, good to know not only dndist changed
<ottom> *dnsdist
<winfried> ottom: Yes sorry about that! I'm testing a OS upgrade process
<ottom> winfried: below works for me. I do not know enough about systemd if both lines are needed:
<ottom> CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_SYS_ADMIN
<ottom> AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_SYS_ADMIN
<ottom> if youalso want to change the eBPF config runtime, you also might need to use https://dnsdist.org/reference/config.html#addCapabilitiesToRetain
<ottom> alternatively, kernel.unprivileged_bpf_disabled can be tweaked. But https://dnsdist.org/advanced/ebpf.html#requirements suggest that is only relevant for kernel >= 5.15 
<winfried> ottom: still "Error creating a BPF map of size 1024: Operation not permitted" after changing CapabilityBoundingSet and AmbientCapabilities and systemctl daemon-reload 
<winfried> ottom: Do I have to call it like this: addCapabilitiesToRetain("CAP_SYS_ADMIN")
<ottom> winfried: yes, that should work, it is a single string or a list of strings
<ottom> if that does not work, I'm out of ideas
<winfried> ottom: no luck so far
<ottom> whats your value of kernel.unprivileged_bpf_disabled?
<ottom> sysctl kernel.unprivileged_bpf_disabled
<winfried> ottom: 2
<ottom> same as here, but i'm running 5.4.0
<ottom> I'm sure Remi would have a clue, but he's on vacation
<ottom> winfried: it would be nice if you can collect the info from this chat in an issue
<ottom> if you have a chance
<winfried> ottom: Yes of course I can file a GH issue

paddg avatar Aug 05 '22 12:08 paddg

It looks like the hints in the docs are not enough to get this working on a 5.14 kernel, suggesting this is a docs issue. As mentioned, the docs could use some reorganizing as well.

omoerbeek avatar Aug 05 '22 16:08 omoerbeek

Coming back to this and re-reading things, I think CAP_BPF instead of CAP_SYS_ADMIN should be used for recent (>= 5.8) kernels. Dunno why i missed that, as the unit file mentions it.

omoerbeek avatar Aug 05 '22 16:08 omoerbeek

Setting:

CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_SYS_ADMIN
AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_SYS_ADMIN
LimitMEMLOCK=infinity

should be enough to make it work (I just tested on a 5.18.16). CAP_BPF is a subset of CAP_SYS_ADMIN so setting CAP_SYS_ADMIN on a recent kernel should not be an issue. If it does not work we will need to test on the exact kernel provided by SLES15-SP4, as they might have done something weird in their backports.

rgacogne avatar Aug 05 '22 20:08 rgacogne

CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_SYS_ADMIN AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_SYS_ADMIN LimitMEMLOCK=infinity

That's exactly what I tried. But it does not work. I also tried CAP_BPF. But still this error message:

Caught exception: Error creating a BPF map of size 1024: Operation not permitted

paddg avatar Aug 08 '22 05:08 paddg

I'll have to set up a SLES 15-SP4 box to test, then. In the meantime, would you be able to confirm that issuing sysctl kernel.unprivileged_bpf_disabled=0 allows dnsdist to start? It can be reverted by either rebooting or issuing sysctl kernel.unprivileged_bpf_disabled=2 afterwards.

rgacogne avatar Aug 08 '22 09:08 rgacogne

would you be able to confirm that issuing sysctl kernel.unprivileged_bpf_disabled=0 allows dnsdist to start?

Yes, I can confirm, with this setting dnsdist starts without errors.

paddg avatar Aug 08 '22 11:08 paddg

I have not been able to reproduce yet, I'll do more tests tomorrow:

# cat /etc/os-release
NAME="SLES"
VERSION="15-SP4"
VERSION_ID="15.4"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp4"
DOCUMENTATION_URL="https://documentation.suse.com/"
# uname -a 
[...] 5.14.21-150400.24.11-default [...]
# sysctl kernel.unprivileged_bpf_disabled
kernel.unprivileged_bpf_disabled = 2
# cat /usr/local/etc/dnsdist.conf
print("a")
bpf = newBPFFilter(1024, 1024, 1024)
print(bpf)
print('b')
Aug 08 16:46:18 ip-172-31-38-199 systemd[1]: Starting DNS Loadbalancer...
Aug 08 16:46:18 ip-172-31-38-199 dnsdist[12935]: a
Aug 08 16:46:18 ip-172-31-38-199 dnsdist[12935]: userdata 0x55e58387b5a8
Aug 08 16:46:18 ip-172-31-38-199 dnsdist[12935]: b
Aug 08 16:46:18 ip-172-31-38-199 dnsdist[12935]: Configuration '/usr/local/etc/dnsdist.conf' OK!
Aug 08 16:46:18 ip-172-31-38-199 dnsdist[12935]: Configuration '/usr/local/etc/dnsdist.conf' OK!
Aug 08 16:46:19 ip-172-31-38-199 dnsdist[12936]: a
Aug 08 16:46:19 ip-172-31-38-199 dnsdist[12936]: userdata 0x55faeae46348
Aug 08 16:46:19 ip-172-31-38-199 dnsdist[12936]: b
Aug 08 16:46:19 ip-172-31-38-199 dnsdist[12936]: Listening on 127.0.0.1:53
Aug 08 16:46:19 ip-172-31-38-199 dnsdist[12936]: dnsdist 1.7.2 comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2
Aug 08 16:46:19 ip-172-31-38-199 dnsdist[12936]: ACL allowing queries from: 10.0.0.0/8, 100.64.0.0/10, 127.0.0.0/8, 169.254.0.0/16, 172.16.0.0/12, 192.168.0.0/16, ::1/128, fc00::/7, fe80::/10
Aug 08 16:46:19 ip-172-31-38-199 dnsdist[12936]: Console ACL allowing connections from: 127.0.0.0/8, ::1/128
Aug 08 16:46:19 ip-172-31-38-199 dnsdist[12936]: No downstream servers defined: all packets will get dropped
Aug 08 16:46:19 ip-172-31-38-199 systemd[1]: Started DNS Loadbalancer.
Aug 08 16:46:19 ip-172-31-38-199 dnsdist[12936]: Polled security status of version 1.7.2 at startup, no known issues reported: OK
[Unit]
Description=DNS Loadbalancer
Documentation=man:dnsdist(1)
Documentation=https://dnsdist.org
Wants=network-online.target
After=network-online.target

[Service]
ExecStartPre=/usr/local/bin/dnsdist --check-config
# Note: when editing the ExecStart command, keep --supervised and --disable-syslog
ExecStart=/usr/local/bin/dnsdist --supervised --disable-syslog
User=dnsdist
Group=dnsdist
SyslogIdentifier=dnsdist
Type=notify
Restart=on-failure
RestartSec=2
TimeoutStopSec=5
StartLimitInterval=0

# Tuning
TasksMax=8192
LimitNOFILE=16384
# Note: increasing the amount of lockable memory is required to use eBPF support
LimitMEMLOCK=infinity

# Sandboxing
# Note: adding CAP_SYS_ADMIN (or CAP_BPF for Linux >= 5.8) is required to use eBPF support,
# and CAP_NET_RAW to be able to set the source interface to contact a backend
CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_BPF
AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_BPF
LockPersonality=true
NoNewPrivileges=true
PrivateDevices=true
PrivateTmp=true
# Setting PrivateUsers=true prevents us from opening our sockets
ProtectClock=true
ProtectControlGroups=true
ProtectHome=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectSystem=full
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
RestrictNamespaces=true
RestrictRealtime=true
RestrictSUIDSGID=true
SystemCallArchitectures=native
SystemCallFilter=~ @clock @debug @module @mount @raw-io @reboot @swap @cpu-emulation @obsolete

[Install]
WantedBy=multi-user.target

rgacogne avatar Aug 08 '22 16:08 rgacogne

It looks like I don't even need LimitMEMLOCK=infinity, adding CAP_BPF to both CapabilityBoundingSet and AmbientCapabilities with systemctl edit --full dnsdist then issuing a systemctl restart dnsdist works for me.

rgacogne avatar Aug 09 '22 08:08 rgacogne

It turned out, AppArmor is the reason why it fails here.

# cat /etc/apparmor.d/usr.sbin.dnsdist
#include <tunables/global>

/usr/sbin/dnsdist {
  #include <abstractions/base>
  #include <abstractions/nameservice>

  capability net_bind_service,
  capability setgid,
  capability setuid,

  network tcp,
  network udp,

  /etc/dnsdist/** r,
  @{PROC}/@{pid}/** r,

  # Site-specific additions and overrides. See local/README for details.
  #include <local/usr.sbin.dnsdist>
}
# cat /etc/apparmor.d/local/usr.sbin.dnsdist
/etc/dnscrypt/** rw,
/home/cert/dnscrypt/** rw,

But I don't know what is missing here.

paddg avatar Aug 09 '22 09:08 paddg

echo "capability bpf," >> /etc/apparmor.d/local/usr.sbin.dnsdist
systemctl restart apparmor.service

did the trick. Now it works with your suggestions.

paddg avatar Aug 09 '22 09:08 paddg

Oh, I did not suspect AppArmor, well done!

rgacogne avatar Aug 09 '22 10:08 rgacogne

Is that AppArmor policy an internal one, or can we submit a patch to it?

rgacogne avatar Aug 09 '22 10:08 rgacogne

It is internal. But it was inspired from a OBS project. I can try to send the maintainer a note about it.

paddg avatar Aug 09 '22 10:08 paddg

Will the unit file get an update as well?

paddg avatar Aug 09 '22 10:08 paddg

I'm not sure I want to grant more privileges by default, since most people are not using eBPF, but I might be wrong. In any case I going to add a mention about AppArmor in the documentation and the unit file!

rgacogne avatar Aug 09 '22 10:08 rgacogne

In any case I going to add a mention about AppArmor in the documentation and the unit file!

Yes, that should be sufficient.

paddg avatar Aug 09 '22 11:08 paddg

Thanks a lot for the feedback, Winfried, much appreciated!

rgacogne avatar Aug 09 '22 14:08 rgacogne

Closing since #11839 has been merged.

rgacogne avatar Nov 28 '22 14:11 rgacogne