whalewall icon indicating copy to clipboard operation
whalewall copied to clipboard

Support for containers running on macvlan networks

Open lox opened this issue 1 year ago • 6 comments

I have containers that run on macvlan networks (to allow broadcast/multicast easily). I'd love to be able to use whalewall to firewall off web admin ports, but allow access to things like DNS.

An example might be adguard, which needs 53/tcp and 53/udp accessible, but I'd like to limit access to 80/443 to only a specific network (traefik vs the macvlan network).

Might something like that be in-scope for whalewall?

lox avatar May 14 '23 11:05 lox

I've never used macvlan networks, but that should be possible now. Try specifying the network for your output rules you want to limit to a single network and let me know if that works.

capnspacehook avatar May 14 '23 12:05 capnspacehook

Interesting, looks like macvlan containers aren't subject to a lot of the standard iptables chains and need something like this: https://github.com/deitch/ctables

lox avatar May 14 '23 21:05 lox

I've been looking into something similar and discovered whalewall. I wonder if it'd be possible to move all the rules into the container's network namespace. The NetworkSettings.SandboxKey, from Docker inspect, seems to point to a file related to the network namespace. I haven't experimented enough yet, but I wonder if opening that file is all that would be needed to use it with nftables.WithNetNSFd.

Bonus: no cleanup would be needed when a container exits, as all the rules should go away along with the namespace when the namespace is destroyed.

esev avatar Aug 08 '23 16:08 esev

This should be possible, the only downside is the nftables rules could be viewed or modified by the root user in the Docker container, since the rules would be in the container's namespace instead of the host's.

capnspacehook avatar Aug 22 '23 23:08 capnspacehook

the only downside is the nftables rules could be viewed or modified by the root user in the Docker container,

I think this would only be possible for containers with the NET_ADMIN capability. But I agree, the more generic solution would be the one implemented today.

esev avatar Aug 23 '23 01:08 esev

Bonus: no cleanup would be needed when a container exits, as all the rules should go away along with the namespace when the namespace is destroyed.

I did a little experiment with firewall rules in the container namespace. It is inspired by how kubernetes/podman pods work.

Upsides:

  1. No need to mount docker socket
  2. No docker polling/event listening
  3. Rules are guaranteed to be setup before the restricted container starts
  4. Plain nftables rules
  5. Cleans up after itself

Downsides:

  1. More yaml
  2. Extra containers (1 stopped + 1 running per network namespace)
services:

  pod:
    # Long running process to keep the network namespace alive
    # Stopping (or restarting) this container will kill the network in dependent containers
    command: sleep infinity
    image: alpine
    # Use a non-root user
    user: 1000:1000
    # Use init, as sleep isn't intended to run as pid 1
    # https://daveiscoding.com/why-do-you-need-an-init-process-inside-your-docker-container-pid-1
    init: true
    # Drop all capabilities and make read-only, this container does nothing
    cap_drop:
      - ALL
    read_only: true
    # Bonus: block DNS tunneling by disabling DNS forwarding (forward to 0.0.0.0)
    # Containers can still resolve hostnames of containers on the same network (e.g. hostname_demo),
    # but they can't resolve public domains such as google.com
    # However, this also effects /etc/resolv.conf inside the firewall container
    # https://github.com/moby/moby/issues/19474#issuecomment-276406305
    dns: 0.0.0.0

  firewall:
    # Use firewalld, or any other image with nftables installed
    image: quay.io/firewalld/firewalld
    # Run as root user
    user: 0:0
    # Give only the required NET_ADMIN capability
    cap_drop:
      - ALL
    cap_add: 
      - NET_ADMIN
    depends_on:
      pod:
        # Compose should restart the firewall after it updates pod
        # This applies to explicit restart controlled by a Compose operation only!
        restart: true
        condition: service_started
    # Join the network namespace of pod
    network_mode: service:pod
    command: >
      sh -c "
            echo 'Setup a new table and chain...' &&
            nft add table inet filter &&
            nft add chain inet filter output { type filter hook output priority 0\\; } &&
            
            echo 'Allow private network ranges...' &&
            nft add rule inet filter output ip daddr 10.0.0.0-10.255.255.255 accept &&
            nft add rule inet filter output ip daddr 172.16.0.0-172.31.255.255 accept &&
            nft add rule inet filter output ip daddr 192.168.0.0-192.168.255.255 accept &&
            nft add rule inet filter output ip daddr 127.0.0.0-127.255.255.255 accept &&
            
            echo 'Allow 1.1.1.1 as example...' &&
            nft add rule inet filter output ip daddr 1.1.1.1 accept &&
            
            echo 'Drop connections to all other IP addresses...' &&
            nft add rule inet filter output drop &&

            nft list table inet filter
          "

  restricted:
    image: alpine
    # Start this container once the firewall has been setup
    depends_on:
      firewall:
        restart: true
        condition: service_completed_successfully
    network_mode: service:pod
    # Run as regular user
    user: 1000:1000
    init: true
    # Without any capabilities
    cap_drop:
      - ALL
    # Show the firewall is effective
    command: >
      sh -c "
            echo 'Resolving internal hostname works:'
            ping hostname_demo -c 1
            echo
            echo 'Resolving public domain fails:'
            wget -T 1 -t 1 google.com -O - | head
            echo
            echo 'Accessing allowed IP address works:'
            wget 1.1.1.1 -O - | head -n 21 | tail -n 17
            echo
            echo 'Accessing any other IP address fails:'
            wget -T 1 -t 1 40.89.244.232 -O - | head
            exit 0
          "

  # Bonus: another container on the same network to demo resolving internal DNS names
  hostname_demo:
    image: alpine
    user: 1000:1000
    init: true
    cap_drop:
      - ALL
    read_only: true
    command: sleep infinity

  # Bonus: try wiping the firewall rules as root user
  try_firewall_wipe:
    image: quay.io/firewalld/firewalld
    depends_on:
      restricted:
        restart: true
        condition: service_completed_successfully
    network_mode: service:pod
    user: 0:0
    init: true
    cap_drop:
      - ALL
    # Root can't wipe the firewall rules without the NET_ADMIN capability
    command: >
      sh -c "
            echo 'Try wiping the firewall as root user...'
            nft flush ruleset
            echo 'Demo is finished!'
          "

Jip-Hop avatar Feb 07 '24 11:02 Jip-Hop