caddy-docker Add CAP_NET_ADMIN to caddy binary for 2.7

Related to https://github.com/caddyserver/dist/issues/97

May 30 '23 16:05 hairyhenderson

Not sure if this works or has the desired effect. Regular container ports are proxied by docker. So caddy would only be able to manipulate the one within the container?

(unless host networking is used)

May 30 '23 17:05 bt90

@bt90 this was an experiment that probably won't work as-is (see the failing build)

Regular container ports are proxied by docker. So caddy would only be able to manipulate the one within the container?

Right, but ultimately they still require CAP_NET_ADMIN to make certain changes.

I'm not super clear on the reason behind this... it has something to do with improved UDP performance with the QUIC implementation, though it can work without it. From a discussion @francislavoie and I were having yesterday we'll probably just have to document that users should use --cap-add=net_admin (or whatever) if they want better performance with QUIC 🤔

May 31 '23 15:05 hairyhenderson

we'll probably just have to document that users should use --cap-add=net_admin (or whatever) if they want better performance with QUIC 🤔

That in combination with --network host should yield the best performance.

https://docs.docker.com/network/host/

May 31 '23 16:05 bt90

That in combination with --network host

Probably should be cautious?

CAP_NET_ADMIN + --network host

Q1. Can an attacker gain root on my host OS using only the NET_ADMIN capability? Yes (in some cases).
2023 CVE (although doesn't appear to be too practical)

A local user (with CAP_NET_ADMIN capability) could use this flaw to crash the system or potentially escalate their privileges on the system.
Has previously been exploited for privilege escalation

CAP_NET_ADMIN allows the capability holder to modify the exposed network namespaces' firewall, routing tables, socket permissions, network interface configuration and other related settings on exposed network interfaces. It should be noted several privilege escalation vulnerabilities and other historical weaknesses have resulted from the ability to leverage this capability.

Support for --network host varies by platform too IIRC. On Windows within WSL2, while you can run a container with this option, you cannot successfully bind ports that can be reached from other processes on the WSL2 host (I think this is due to the Docker daemon running via a separate VM instance in WSL2).

Regular container ports are proxied by docker.

IIRC this depends on network mode (not just host), and daemon configuration.

Default is user-proxy: true for some time, but that can be disabled which changes the networking a bit (somewhat inconsistent when iptables is managed and kernel features used to support localhost routing).
It is planned to disable user-proxy and eventually remove the proxy in future though, along with a new replacement for managing the network instead of iptables (which fights with UFW/firewalld). This also differs on other platforms than linux, especially via the desktop GUI implementation Docker Desktop (which uses a VM even on linux IIRC).

So caddy would only be able to manipulate the one within the container

From within it's network namespace I think? The first link I provided above notes that from kernel 5.0 a feature available via the capability was relaxed to be usable outside of the root network namespace.

From a discussion @francislavoie and I were having yesterday we'll probably just have to document that users should use --cap-add=net_admin (or whatever) if they want better performance with QUIC 🤔

Since the image switched to non-root, this wouldn't be sufficient? You can only add caps for the root user AFAIK, when the container switches to a non-root user (even if the image sets that via USER instruction in Dockerfile) the caps are dropped IIRC. Unless this was about just relaxing the bounding set for =ep to work.

You can set them via file attributes as a "capability-dumb" binary (=ep), or if Caddy has capability awareness, it'd check for the capability as permitted (=p via setcap needed in a container since ambient caps aren't available) where it can then raise the permitted capability into the effective set.

I detailed what would need to be done a bit more here.
I see that it's already being added as an ambient capability in systemd dist config

I'm not a security expert, and I know it's used for other software like Fail2Ban to do it's thing, just making note of it here since the image adopted non-root user, but granting capabilities that add risk may be worth noting (especially when not absolutely needed), rather than just configuring for such implicitly 😅 (granted this requires an explicit --cap-add anyway..)

If you're going to document the --cap-add though, you could consider instead documenting the sysctl that's referenced as an alternative, assuming that's valid to set on the container like the equivalent for CAP_BIND_NET_SERVICE is (not necessary in Docker since that's already set to 0 by default making the capability unnecessary unless altered).

Dec 27 '23 07:12 polarathene

As a (happy!) user of Caddy in Docker, I would prefer to manually update the UDP receive and send buffer settings rather than have Caddy handle this with the CAP_NET_ADMIN capability on my behalf. I appreciate the focus on usability and performance, but I share @polarathene's remarks that this capability could pose security risks. Perhaps it's possible instead to include some copy-pastable lines in the documentation that make this manual effort less of a burden?

May 26 '24 09:05 childersd

Perhaps it's possible instead to include some copy-pastable lines in the documentation that make this manual effort less of a burden?

@childersd the NET_ADMIN capability is already documented in the DockerHub docs (scroll to Linux capabilities) - maybe that needs to be more clear? or something?

Given it's been almost a year, this has stalled, and given the potential security impact, I'm going to just close this.

May 26 '24 17:05 hairyhenderson

(not necessary in Docker since that's already set to 0 by default making the capability unnecessary unless altered).

👆 ~~For Docker the capability isn't required~~, as the actual need for it is better managed by a sysctl.

If this PR was accepted, you'd be enforcing a capability that cannot be opt-out when it's not needed.
There is a proper way to approach raising effective capabilities at runtime when they've been permitted, without having to enforce a kernel check prior to executing the binary (caddy --help and other commands for example wouldn't need NET_ADMIN).

EDIT: I mixed up the capability NET_ADMIN with BIND_NET_SERVICE, but both have a sysctl that can be tuned instead and NET_ADMIN is required only for an enhancement (QUIC for HTTP/3), thus shouldn't be enforced given my earlier points as it's not mandatory to run Caddy.

maybe that needs to be more clear? or something?

It's clear, although an inline comment above the capability in the compose.yaml example could inline context for why, or direct the user to the "Linux capabilities" section right above it.

The equivalent sysctl is not namespaced, thus cannot be adjusted per container. I would still advise raising the needed cap at runtime via Go rather than making it mandatory via kernel check to run caddy as this PR proposed.

May 27 '24 00:05 polarathene