Weave node cannot connect to peer due to a duplicate iptables rule on peer
What you expected to happen?
Weave peers must remain connected.
What happened?
We're running five containers that are managed by Docker Swarm. These containers are the only ones on the weave network. After some time a random node may become unreachable by its peers.
How to reproduce it?
I cannot reproduce it, but the hypothesis is that when a node (for our example IP address 1.2.3.4) is under heavy load and possibly runs out of memory the Docker daemon is killed abruptly rather than gracefully. Once Docker restarts a duplicate iptables rule appears, eg. -A WEAVE-IPSEC-IN -s 4.5.6.7/32 -d 1.2.3.4/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP appears twice. Rebooting the problem node fixes the issue since the iptables dynamic rules are added correctly on a fresh boot.
It may be possible to trigger this by sending a kill -9 to the Docker daemon.
Anything else we need to know?
Docker Swarm.
Versions:
$ weave version
2.8.0
$ docker version
20.10.12
$ uname -a
Linux amsterdam3 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Lin
$ kubectl version
Logs:
$ docker logs weave
# No logs unfortunately since it is working correctly now. I will have to wait for the error to appear again.
or, if using Kubernetes:
$ kubectl logs -n kube-system <weave-net-pod> weave
Network:
$ ip route
$ ip -4 -o addr
$ sudo iptables-save
# I have to wait for the error to manifest again, but I'm pretty sure it's because of the duplicate iptables rule.