dhcpcd icon indicating copy to clipboard operation
dhcpcd copied to clipboard

dhcpcd 10.2.4 overwrites default route when veth devices created (by docker), causing full loss of connectivity

Open mwaddoups opened this issue 3 months ago • 2 comments

Firstly, I want to note this issue here but want to apologise in advance - I'm almost certain this is an issue with docker's interface spam rather than dhcpcd, but it is dhcpcd that ends up overwriting the default route. I wanted to write an issue even though I haven't had time to dig into the details so that others can catch it. I know half-researched issues are a pain, but unfortunately this caught us on a remote production server so I can't do a lot of debugging. This also might well be expected behaviour.

This issue only appeared on upgrade from 10.2.3 to 10.2.4, and prior to this I had no issues.

I believe the workaround is to exclude these devices from dhcpcd.

When docker creates new containers, it creates interfaces for those containers. It appears that sometimes, but not always, dhcpcd will choose that interface for a new default route over the existing WAN interface. This is all ipv4. Se logs below:

# I start a new docker container
Sep 24 09:56:22 bs-bt dhcpcd[2105]: vethf3dccb4: carrier lost
Sep 24 09:56:22 bs-bt kernel: docker0: port 1(vethf3dccb4) entered disabled state
Sep 24 09:56:22 bs-bt kernel: veth603f327: renamed from eth0
Sep 24 09:56:22 bs-bt dhcpcd[2105]: vethf3dccb4: deleting address fe80::a2e4:1aba:8964:3786
Sep 24 09:56:22 bs-bt kernel: docker0: port 1(vethf3dccb4) entered disabled state
Sep 24 09:56:22 bs-bt kernel: vethf3dccb4 (unregistering): left allmulticast mode
Sep 24 09:56:22 bs-bt kernel: vethf3dccb4 (unregistering): left promiscuous mode
Sep 24 09:56:22 bs-bt kernel: docker0: port 1(vethf3dccb4) entered disabled state
# Dhcpcd picks up the new veth interface, and deletes my default route
Sep 24 09:56:22 bs-bt dhcpcd[2105]: veth358d8fd: adding default route
Sep 24 09:56:22 bs-bt dhcpcd[2105]: vethf3dccb4: deleting route to 169.254.0.0/16
Sep 24 09:56:22 bs-bt dhcpcd[2105]: enp6s0: deleting route to 157.180.56.0/26
Sep 24 09:56:22 bs-bt dhcpcd[2105]: enp6s0: deleting default route via 157.180.56.1
Sep 24 09:56:22 bs-bt dhcpcd[2105]: vethf3dccb4: removing interface

This doesn't happen 100% of the time.

As a workaround, I've added the below to config:

denyinterfaces docker* br-* veth*

No other settings are present on my config.

If it's helpful for debugging, the machine interfaces are

bs-bt ~ $ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# Wan interface
2: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 9c:6b:00:4c:de:45 brd ff:ff:ff:ff:ff:ff
# Docker bridge
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 6e:35:0b:89:07:67 brd ff:ff:ff:ff:ff:ff
# Various veth's for containers
4: veth1d8f0ff@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether de:f0:f5:51:d5:13 brd ff:ff:ff:ff:ff:ff link-netnsid 0
5: vethda3d1fe@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether fa:b9:b9:ac:26:86 brd ff:ff:ff:ff:ff:ff link-netnsid 1
# Wireguard interface
6: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/none 
7: veth7d9c4dd@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether 3e:c1:89:78:02:98 brd ff:ff:ff:ff:ff:ff link-netnsid 2

This is on Gentoo linux (stable), standard use flags, otherwise standard config.

Hopefully this helps somebody!

mwaddoups avatar Sep 24 '25 09:09 mwaddoups

To update on this: this issue is very similar to #517, and workaround above is now failing for me. I seem to have the same issue - a socket overflow despite the fact that I'm excluding all other interfaces.

The logs do show an unexpected event that could be causing this.

Sep 24 17:10:21 bs-bt dhcpcd[3366]: dhcpcd_handlelink: unexpected event 0x0101
Sep 24 17:10:21 bs-bt dhcpcd[3366]: route socket overflowed (rcvbuflen 106496) - learning interface state
Sep 24 17:10:21 bs-bt dhcpcd[3366]: drained 127 messages
Sep 24 17:10:21 bs-bt kernel: br-61f5deaff065: port 12(veth10ee2ca) entered disabled state
Sep 24 17:10:21 bs-bt kernel: veth10ee2ca (unregistering): left allmulticast mode
Sep 24 17:10:21 bs-bt kernel: veth10ee2ca (unregistering): left promiscuous mode
Sep 24 17:10:21 bs-bt kernel: br-61f5deaff065: port 12(veth10ee2ca) entered disabled state
Sep 24 17:10:21 bs-bt dhcpcd[3366]: enp6s0: deleting route to 157.180.56.0/26
Sep 24 17:10:21 bs-bt dhcpcd[3366]: enp6s0: deleting default route via 157.180.56.1

I'll respond here if I get a workaround.

mwaddoups avatar Sep 24 '25 16:09 mwaddoups

route socket overflowed

Limiting dhcpcd with interfaces won't solve that issue. The only solution is to configure a bigger buffer! See link_rcvbuf in dhcpcd.8

rsmarples avatar Nov 17 '25 12:11 rsmarples