Failure to renew with "ps_root_recvmsg: Network is unreachable"
dhcpcd 10.1.0
I have 2 interfaces. eno5 is connected to fiber provider and eno8.2 to starlink. Once in a while, "something" happens dhcpcd during renewal process of eno5 address drops default route through it, which is followed by a number of "ps_root_recvmsg: Network is unreachable" messages and failure to get a new lease. Insisting on renewing lease by running dhcpcd -n eno5 resolves problem
2025-01-22T12:26:13.832108-08:00 router dhcpcd[974484]: eno8.2: renewing lease of 100.xx.xxx.xxx
2025-01-22T12:26:13.832984-08:00 router dhcpcd[974484]: eno8.2: sending REQUEST (xid 0x788daebc), next in 4.2 seconds
2025-01-22T12:26:13.833329-08:00 router dhcpcd[974484]: eno8.2: acknowledged 100.xx.xxx.xxx from 100.64.0.1
2025-01-22T12:26:13.833561-08:00 router dhcpcd[974484]: eno8.2: leased 100.xx.xxx.xxx for 300 seconds
2025-01-22T12:26:13.833915-08:00 router dhcpcd[974484]: eno8.2: renew in 150 seconds, rebind in 262 seconds
2025-01-22T12:26:13.834133-08:00 router dhcpcd[974484]: eno8.2: writing lease: /var/lib/dhcpcd/eno8.2.lease
2025-01-22T12:26:13.834336-08:00 router dhcpcd[974484]: eno8.2: adding IP address 100.xx.xxx.xxx/10 broadcast 100.127.255.255
2025-01-22T12:26:13.834547-08:00 router dhcpcd[974484]: eno8.2: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks RENEW
2025-01-22T12:28:43.455886-08:00 router dhcpcd[974484]: eno5: renewing lease of 47.yyy.yyy.78
2025-01-22T12:28:43.456488-08:00 router dhcpcd[974484]: eno5: sending REQUEST (xid 0xcc8b9d2d), next in 4.4 seconds
2025-01-22T12:28:43.834049-08:00 router dhcpcd[974484]: eno8.2: renewing lease of 100.xx.xxx.xxx
2025-01-22T12:28:43.840894-08:00 router dhcpcd[974484]: eno8.2: sending REQUEST (xid 0x56b9244), next in 3.0 seconds
2025-01-22T12:28:43.841049-08:00 router dhcpcd[974484]: eno8.2: acknowledged 100.xx.xxx.xxx from 100.64.0.1
2025-01-22T12:28:43.841186-08:00 router dhcpcd[974484]: eno8.2: leased 100.xx.xxx.xxx for 300 seconds
2025-01-22T12:28:43.841315-08:00 router dhcpcd[974484]: eno8.2: renew in 150 seconds, rebind in 262 seconds
2025-01-22T12:28:43.841448-08:00 router dhcpcd[974484]: eno8.2: writing lease: /var/lib/dhcpcd/eno8.2.lease
2025-01-22T12:28:43.841591-08:00 router dhcpcd[974484]: eno8.2: adding IP address 100.xx.xxx.xxx/10 broadcast 100.127.255.255
2025-01-22T12:28:43.841714-08:00 router dhcpcd[974484]: eno5: deleting route to 47.yyy.yyy.0/23
2025-01-22T12:28:43.841889-08:00 router dhcpcd[974484]: eno5: deleting default route via 47.yyy.yyy.1
2025-01-22T12:28:43.842030-08:00 router dhcpcd[974484]: eno8.2: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks RENEW
2025-01-22T12:28:47.831178-08:00 router dhcpcd[974484]: eno5: sending REQUEST (xid 0xcc8b9d2d), next in 7.9 seconds
2025-01-22T12:28:47.831530-08:00 router dhcpcd[974484]: ps_root_recvmsg: Network is unreachable
2025-01-22T12:28:55.695001-08:00 router dhcpcd[974484]: eno5: sending REQUEST (xid 0xcc8b9d2d), next in 15.7 seconds
2025-01-22T12:28:55.695600-08:00 router dhcpcd[974484]: ps_root_recvmsg: Network is unreachable
2025-01-22T12:29:11.354362-08:00 router dhcpcd[974484]: eno5: sending REQUEST (xid 0xcc8b9d2d), next in 31.5 seconds
2025-01-22T12:29:11.354965-08:00 router dhcpcd[974484]: ps_root_recvmsg: Network is unreachable
2025-01-22T12:29:42.858769-08:00 router dhcpcd[974484]: eno5: sending REQUEST (xid 0xcc8b9d2d), next in 63.8 seconds
2025-01-22T12:29:42.859655-08:00 router dhcpcd[974484]: ps_root_recvmsg: Network is unreachable
2025-01-22T12:29:43.462247-08:00 router dhcpcd[974484]: control command: dhcpcd -n eno5
2025-01-22T12:29:43.463884-08:00 router dhcpcd[974484]: eno5: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks RECONFIGURE
2025-01-22T12:29:43.476871-08:00 router dhcpcd[974484]: eno5: IAID ef:0c:9f:dc
2025-01-22T12:29:43.477024-08:00 router dhcpcd[974484]: eno5: delaying IPv4 for 0.3 seconds
2025-01-22T12:29:43.764028-08:00 router dhcpcd[974484]: eno5: reading lease: /var/lib/dhcpcd/eno5.lease
2025-01-22T12:29:43.764338-08:00 router dhcpcd[974484]: eno5: rebinding lease of 47.yyy.yyy.78
2025-01-22T12:29:43.764462-08:00 router dhcpcd[974484]: eno5: sending REQUEST (xid 0x164683cc), next in 3.6 seconds
2025-01-22T12:29:43.764784-08:00 router dhcpcd[974484]: eno5: spawned BPF BOOTP on PID 1473641
2025-01-22T12:29:44.054601-08:00 router dhcpcd[974484]: eno5: acknowledged 47.yyy.yyy.78 from 74.42.148.118
2025-01-22T12:29:44.054879-08:00 router dhcpcd[974484]: eno5: leased 47.yyy.yyy.78 for 1800 seconds
2025-01-22T12:29:44.055092-08:00 router dhcpcd[974484]: eno5: renew in 900 seconds, rebind in 1575 seconds
2025-01-22T12:29:44.055273-08:00 router dhcpcd[974484]: eno5: writing lease: /var/lib/dhcpcd/eno5.lease
2025-01-22T12:29:44.055444-08:00 router dhcpcd[974484]: eno5: adding IP address 47.yyy.yyy.78/23 broadcast 47.yyy.zzz.255
2025-01-22T12:29:44.055566-08:00 router dhcpcd[974484]: eno5: adding route to 47.yyy.yyy.0/23
2025-01-22T12:29:44.055694-08:00 router dhcpcd[974484]: eno5: adding default route via 47.yyy.yyy.1
2025-01-22T12:29:44.055809-08:00 router dhcpcd[974484]: eno5: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks RENEW
2025-01-22T12:29:44.094381-08:00 router dhcpcd[974484]: eno5: BPF BOOTP exited from PID 1473641
config is:
allowinterfaces eno8.2 eno5
debug
#hostname
duid
vendorclassid
option domain_name_servers, domain_name, domain_search
option classless_static_routes
option interface_mtu
option host_name
nogateway
noipv6rs
noipv4ll
option rapid_commit
require dhcp_server_identifier
nohook resolv.conf
interface eno5
gateway
noarp
Here's the interesting section:
2025-01-22T12:28:43.834049-08:00 router dhcpcd[974484]: eno8.2: renewing lease of 100.xx.xxx.xxx
2025-01-22T12:28:43.840894-08:00 router dhcpcd[974484]: eno8.2: sending REQUEST (xid 0x56b9244), next in 3.0 seconds
2025-01-22T12:28:43.841049-08:00 router dhcpcd[974484]: eno8.2: acknowledged 100.xx.xxx.xxx from 100.64.0.1
2025-01-22T12:28:43.841186-08:00 router dhcpcd[974484]: eno8.2: leased 100.xx.xxx.xxx for 300 seconds
2025-01-22T12:28:43.841315-08:00 router dhcpcd[974484]: eno8.2: renew in 150 seconds, rebind in 262 seconds
2025-01-22T12:28:43.841448-08:00 router dhcpcd[974484]: eno8.2: writing lease: /var/lib/dhcpcd/eno8.2.lease
2025-01-22T12:28:43.841591-08:00 router dhcpcd[974484]: eno8.2: adding IP address 100.xx.xxx.xxx/10 broadcast 100.127.255.255
2025-01-22T12:28:43.841714-08:00 router dhcpcd[974484]: eno5: deleting route to 47.yyy.yyy.0/23
2025-01-22T12:28:43.841889-08:00 router dhcpcd[974484]: eno5: deleting default route via 47.yyy.yyy.1
2025-01-22T12:28:43.842030-08:00 router dhcpcd[974484]: eno8.2: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks RENEW
2025-01-22T12:28:47.831178-08:00 router dhcpcd[974484]: eno5: sending REQUEST (xid 0xcc8b9d2d), next in 7.9 seconds
2025-01-22T12:28:47.831530-08:00 router dhcpcd[974484]: ps_root_recvmsg: Network is unreachable
eno8.2 starts a renew and when the lease is applied the subnet and default routs are deleted from eno5 for reasons unknown - that is the bug. This causes a renew failure on eno5. eno5 should recover when T2 is reached as that will trigger a REBIND which the OP shows works when done manually. However this results in a window of 600 seconds without a useable config which isn't ideal.
is there a way that I can collect more information that will clarify situation ? I had hopes that enabling debug will be more informative in order to understand what happens.
It’s pretty clear what is happening, just not the why. Is one of the interfaces a VLAN and is it on the other interface or a different one?
different interfaces. one with vlan (eno8.2) connected via switch to starlink eno5 going directly to ONT
by "clarify" i meant to see "why". some magical extra-extra-debug setting that dumps some decision making log messages...
it seems to happen every 2-3 days. is there anything i can do in order to gather some additional information that could help to resolve it ?
Here's the interesting section:
2025-01-22T12:28:43.834049-08:00 router dhcpcd[974484]: eno8.2: renewing lease of 100.xx.xxx.xxx 2025-01-22T12:28:43.840894-08:00 router dhcpcd[974484]: eno8.2: sending REQUEST (xid 0x56b9244), next in 3.0 seconds 2025-01-22T12:28:43.841049-08:00 router dhcpcd[974484]: eno8.2: acknowledged 100.xx.xxx.xxx from 100.64.0.1 2025-01-22T12:28:43.841186-08:00 router dhcpcd[974484]: eno8.2: leased 100.xx.xxx.xxx for 300 seconds 2025-01-22T12:28:43.841315-08:00 router dhcpcd[974484]: eno8.2: renew in 150 seconds, rebind in 262 seconds 2025-01-22T12:28:43.841448-08:00 router dhcpcd[974484]: eno8.2: writing lease: /var/lib/dhcpcd/eno8.2.lease 2025-01-22T12:28:43.841591-08:00 router dhcpcd[974484]: eno8.2: adding IP address 100.xx.xxx.xxx/10 broadcast 100.127.255.255 2025-01-22T12:28:43.841714-08:00 router dhcpcd[974484]: eno5: deleting route to 47.yyy.yyy.0/23 2025-01-22T12:28:43.841889-08:00 router dhcpcd[974484]: eno5: deleting default route via 47.yyy.yyy.1 2025-01-22T12:28:43.842030-08:00 router dhcpcd[974484]: eno8.2: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks RENEW 2025-01-22T12:28:47.831178-08:00 router dhcpcd[974484]: eno5: sending REQUEST (xid 0xcc8b9d2d), next in 7.9 seconds 2025-01-22T12:28:47.831530-08:00 router dhcpcd[974484]: ps_root_recvmsg: Network is unreachableeno8.2 starts a renew and when the lease is applied the subnet and default routs are deleted from eno5 for reasons unknown - that is the bug. This causes a renew failure on eno5. eno5 should recover when T2 is reached as that will trigger a REBIND which the OP shows works when done manually. However this results in a window of 600 seconds without a useable config which isn't ideal.
you actually missed a couple of lines above
2025-01-22T12:28:43.455886-08:00 router dhcpcd[974484]: eno5: renewing lease of 47.yyy.yyy.78 2025-01-22T12:28:43.456488-08:00 router dhcpcd[974484]: eno5: sending REQUEST (xid 0xcc8b9d2d), next in 4.4 second
eno5 also starts renew half a second before. it's just eno8.2 lease is renewed quicker even though it starts later (in all failures that i have eno5 starts renewing within second or so of eno8.2).
i tried to go through code, but my C is very-very rusty. can there be case that because eno5 is in process of renewal when rt_build runs after completion of lease renewal of eno8.2 (if i correctly understand the code), it decides to clean up some of it routes ?
Hi, As I wrote in previous comment problem happens when there are 2 renew processes run within same time interval (i have 150sec renew on startlink and 900sec on frontier. so they frequently overlap).
I tried to dig through code. Can there be case that when rt_build runs for eno8.2, when it collects current routes inet_getroutes won't return routes belonging to eno5 because it's in DHS_RENEW state and because of it they will be removed ?
@rsmarples any chance my theory from above is right and fix will be to add interface in DHS_RENEW state to be collected ?
For the IPv4 case routes are generated based on the STATE_ADDED flag which is separate from the DHCP engine state.
I've had another quick look through the code and it's still not obvious to me how it's being lost from the logs you've provided.
@rsmarples you obviously familiar with code better.
so i'll just explain what i think i see in more details, to see if it makes any sense for you
I went back from "deleting route" messages which are logged in "rt_delete". The only place that rt_delete is called from is from rt_build where during comparison of newly build tables of routes (after lease was obtained or renewed) and old one (the code under the comment "/* Remove old routes we used to manage. */")
New list of routes generate in the beginning by calling inet_getroutes which calls inet_dhcproutes per interface
TAILQ_FOREACH(ifp, ctx->ifaces, next) {
if (!ifp->active)
continue;
if (inet_dhcproutes(routes, ifp, &have_default) == -1)
return false;
In the beginning of the inet_dhcproutes there is code
if (state == NULL || state->state != DHS_BOUND || !state->added)
return 0;
at this point of time system is in state (if i understand correctly) where
eno8.2 state is DHS_BOUND. it routes are collected eno5 state is DHS_RENEW. it routes are not collected and nuked later by rt_build.
the check for STATE_ADDED is a couple of functions later in dhcp_get_routes > STATE_ADDED, but eno5 interface never gets there as inet_dhcproutes exist in the beginning.
@JognSmit thanks for pointing out how dumb I was for so many months! We should not be checking DHCP state there at all. Hopefully the above patch/branch fixes it for you.