dhcpcd icon indicating copy to clipboard operation
dhcpcd copied to clipboard

dhcpd program crash in racing between SIGTERM and REPLY6 handling

Open Sime-Zupanovic opened this issue 5 months ago • 6 comments

This is happening on dhcpcd-10.2.2 version. What seems to happen here is racing between SIGTERM and REPLY6 handling.

And below is pmd full back trace.

Core was generated by `dhcpcd: port0 [ip4] [ip6] '. Program terminated with signal SIGSEGV, Segmentation fault. #0 dhcp6_delegate_prefix (ifp=0x558af285e0) at dhcp6.c:3029 3029 dhcp6.c: No such file or directory. (gdb) bt #0 dhcp6_delegate_prefix (ifp=0x558af285e0) at dhcp6.c:3029 #1 dhcp6_bind (ifp=ifp@entry=0x558af285e0, op=op@entry=0x558691f960 "REPLY6", sfrom=sfrom@entry=0x7fe1779ac0 "fe80::5074:f2ff:feb1:a87f ") at dhcp6.c:3402 #2 0x000000558690e610 in dhcp6_recvif (len=, r=0x7fe1779cc0, sfrom=0x7fe1779ac0 "fe80::5074:f2ff:feb1:a87f ", ifp=) at dhcp6.c:3748 #3 dhcp6_recvmsg (ctx=, msg=, ia=ia@entry=0x0) at dhcp6.c:3886 #4 0x00000055869149b0 in ps_inet_dispatch (arg=, psm=, msg=) at privsep-inet.c:337 #5 0x00000055869118cc in ps_recvpsmsg (ctx=0x7fe1789e18, fd=, events=, callback=callback@entry=0x55869125b4 <ps_root_dispatchcb>, cbctx=0x7fe1789e18) at privsep.c:1181 #6 0x0000005586912434 in ps_root_dispatch (arg=, events=) at privsep-root.c:850 #7 0x00000055868e4b54 in eloop_run_ppoll (signals=0x7fe178a058, ts=, eloop=0x558af17810) at eloop.c:1106 #8 eloop_start (eloop=0x558af17810, signals=signals@entry=0x7fe178a058) at eloop.c:1228 #9 0x00000055868dda74 in main (argc=, argv=, envp=) at dhcpcd.c:2648

(gdb) bt full #0 dhcp6_delegate_prefix (ifp=0x558af285e0) at dhcp6.c:3029 j = --Type <RET> for more, q to quit, c to continue without paging-- ifo = 0x558af28910 state = 0x0 k = ia = carrier_warned = ap = i = sla = ifd = ifo = state = ap = i = j = k = ia = sla = ifd = carrier_warned = loglevel = s = #1 dhcp6_bind (ifp=ifp@entry=0x558af285e0, op=op@entry=0x558691f960 "REPLY6", sfrom=sfrom@entry=0x7fe1779ac0 "fe80::5074:f2ff:feb1:a87f ") at dhcp6.c:3402 state = 0x558af03b50 timedout = false confirmed = ia = loglevel = 6 now = {tv_sec = 1345, tv_nsec = 896085470} #2 0x000000558690e610 in dhcp6_recvif (len=, r=0x7fe1779cc0, sfrom=0x7fe1779ac0 "fe80::5074:f2ff:feb1:a87f ", ifp=) at dhcp6.c:3748 ctx = preference = 0 '\000' ol = 55461 auth_len = 24489 op = 0x558691f960 "REPLY6" state = 0x558af03b50 opt = ifo = auth = i = o = --Type <RET> for more, q to quit, c to continue without paging-- valid_op = true ctx = i = op = state = o = preference = ol = opt = ifo = valid_op = auth = auth_len = func = "dhcp6_recvif" ia = #3 dhcp6_recvmsg (ctx=, msg=, ia=ia@entry=0x0) at dhcp6.c:3886 from = len = sfrom = "fe80::5074:f2ff:feb1:a87f \000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\215\000\000\000\000" ifp = r = 0x7fe1779cc0 state = o = ol = 16 func = "dhcp6_recvmsg" duid = "\030\236x\341\177\000\000\000\003\004@\240\000\001Q\200\000\000\070@\000\000\000\000\375\245d6\323\307\000d\000\000\000\000\000\000\000\000fe80::5074:f2ff:feb1:a87f \000\242w\341\177\000\000\000\020\211\362\212U\000\000\000\020\211\363\212U\000\000\000\300\234w\341\177\000\000\000\001\000\000\000\000\000\000\000\000\001", '\000' <repeats 22 times>, "\001" dp = duid_len = #4 0x00000055869149b0 in ps_inet_dispatch (arg=, psm=, msg=) at privsep-inet.c:337 ctx = #5 0x00000055869118cc in ps_recvpsmsg (ctx=0x7fe1789e18, fd=, events=, callback=callback@entry=0x55869125b4 <ps_root_dispatchcb>, cbctx=0x7fe1789e18) at privsep.c:1181 psm = <error reading variable psm (value of type `ps_msg' requires 65760 bytes, which is more than max-value-size)> len = 246 dlen = 182 iov = {{iov_base = 0x7fe1779cc0, iov_len = 110}} msg = {msg_name = 0x7fe1779c78, msg_namelen = 28, msg_iov = 0x7fe1779c28, msg_iovlen = 1, msg_control = 0x7fe1779c98, msg_controllen = 40, msg_flags = 0} stop = false func = "ps_recvpsmsg" #6 0x0000005586912434 in ps_root_dispatch (arg=, events=) at privsep-root.c:850 --Type <RET> for more, q to quit, c to continue without paging-- ctx = func = "ps_root_dispatch" #7 0x00000055868e4b54 in eloop_run_ppoll (signals=0x7fe178a058, ts=, eloop=0x558af17810) at eloop.c:1106 nn = 0 e = 0x558af22090 n = pfd = events = n = nn = e = pfd = events = #8 eloop_start (eloop=0x558af17810, signals=signals@entry=0x7fe178a058) at eloop.c:1228 error = t = ts = {tv_sec = 1, tv_nsec = 77999580} tsp = PRETTY_FUNCTION = "eloop_start" #9 0x00000055868dda74 in main (argc=, argv=, envp=) at dhcpcd.c:2648 ctx = {pidfile = "/run/dhcpcd/port0.pid", '\000' <repeats 13 times>, vendor = "dhcpcd-10.2.2:Linux-6.1.130-manual-02457-g6b89e504bd2d:aarch64:AArch64", '\000' <repeats 185 times>, fork_fd = -1, cffile = 0x558af02340 "/etc/pnc-dhcpcd.conf", options = 310344232912607245, logfile = 0x0, argc = 5, argv = 0x7fe178a468, ifac = 0, ifav = 0x0, ifdc = 0, ifdv = 0x0, ifc = 1, ifv = 0x7fe178a488, ifcc = 1, ifcv = 0x558af175e0, duid_type = 0 '\000', duid = 0x558af23340 "", duid_len = 16, ifaces = 0x558af233a0, ctl_buf = 0x0, ctl_buflen = 0, ctl_bufpos = 0, ctl_extra = 0, routes = {rbt_root = 0x558af25118, rbt_ops = 0x55869314c0 <rt_compare_os_ops>, rbt_minmax = {0x558af05148, 0x558af25048}}, froutes = {rbt_root = 0x558af06a88, rbt_ops = 0x55869314e0 <rt_compare_free_ops>, rbt_minmax = {0x558af03608, 0x558af24f78}}, rt_order = 1, pf_inet_fd = 15, priv = 0x558af220d0, link_fd = 9, link_rcvbuf = 0, seq = 76, sseq = 0, sigset = {__val = {0 <repeats 16 times>}}, eloop = 0x558af17810, script = 0x558691b298 "/usr/libexec/dhcpcd-run-hooks", script_fp = 0x558af1b410, script_buf = 0x558af3a150 "PATH=/usr/bin:/usr/sbin:/bin:/sbin", script_buflen = 269, script_env = 0x558af04540, script_envlen = 49, control_fd = -1, control_unpriv_fd = -1, control_fds = {tqh_first = 0x558af22000, tqh_last = 0x558af22000}, control_sock = "/run/dhcpcd/port0.sock", '\000' <repeats 14 times>, control_sock_unpriv = "/run/dhcpcd/port0.unpriv.sock", '\000' <repeats 14 times>, control_group = 0, vivso = 0x0, vivso_len = 0, randomstate = 0x0, ps_user = 0x7fa8494668 , ps_processes = {tqh_first = 0x558af22ce0, tqh_last = 0x558af230e0}, ps_root = 0x558af22ce0, ps_inet = 0x558af22ec0, ps_ctl = 0x558af230e0, ps_data_fd = 8, ps_log_fd = -1, ps_log_root_fd = -1, ps_eloop = 0x558af22c40, ps_control = 0x558af22000, ps_control_client = 0x0, dhcp_opts = 0x558af13da0, dhcp_opts_len = 157, udp_rfd = -1, udp_wfd = -1, opt_buffer = 0x0, opt_buffer_len = 0, secret = 0x0, secret_len = 0, nd_fd = -1, ra_routers = 0x558af39730, nd_opts = 0x558af169d0, nd_opts_len = 7, dhcp6_rfd = -1, dhcp6_wfd = -1, dhcp6_opts = 0x558af1f940, dhcp6_opts_len = 84, dev_load = 0x0, dev_fd = -1, dev = 0x0, dev_handle = 0x0} ifaddrs = 0x0 ifo = 0x0 ifp = family = 0 opt = oi = 4 i = 1 logopts = t = --Type <RET> for more, q to quit, c to continue without paging-- len = pid = fork_fd = {5, 6} sig = siga = 0x0 si = 1 func = "main"

(gdb) thread apply all bt

Thread 1 (LWP 17642): #0 dhcp6_delegate_prefix (ifp=0x558af285e0) at dhcp6.c:3029 #1 dhcp6_bind (ifp=ifp@entry=0x558af285e0, op=op@entry=0x558691f960 "REPLY6", sfrom=sfrom@entry=0x7fe1779ac0 "fe80::5074:f2ff:feb1:a87f ") at dhcp6.c:3402 #2 0x000000558690e610 in dhcp6_recvif (len=, r=0x7fe1779cc0, sfrom=0x7fe1779ac0 "fe80::5074:f2ff:feb1:a87f ", ifp=) at dhcp6.c:3748 #3 dhcp6_recvmsg (ctx=, msg=, ia=ia@entry=0x0) at dhcp6.c:3886 #4 0x00000055869149b0 in ps_inet_dispatch (arg=, psm=, msg=) at privsep-inet.c:337 #5 0x00000055869118cc in ps_recvpsmsg (ctx=0x7fe1789e18, fd=, events=, callback=callback@entry=0x55869125b4 <ps_root_dispatchcb>, cbctx=0x7fe1789e18) at privsep.c:1181 #6 0x0000005586912434 in ps_root_dispatch (arg=, events=) at privsep-root.c:850 #7 0x00000055868e4b54 in eloop_run_ppoll (signals=0x7fe178a058, ts=, eloop=0x558af17810) at eloop.c:1106 #8 eloop_start (eloop=0x558af17810, signals=signals@entry=0x7fe178a058) at eloop.c:1228 #9 0x00000055868dda74 in main (argc=, argv=, envp=) at dhcpcd.c:2648

Fault seems to be in dhcp6_delegate_prefix() v10.2.2 dhcp6.c line 3029

state = D6_STATE(ifp);

/* Clear the logged flag. */
TAILQ_FOREACH(ap, &state->addrs, next) {
	ap->flags &= ~IPV6_AF_DELEGATEDLOG;
}

because state was during interface stop placed to null, what you can see from backtrace above. I would like to add here patch:

state = D6_STATE(ifp);
if (state == NULL) {
	loginfox("%s: ignoring REPLY6, DHCP6 state is null", ifp->name);
	return;
}

/* Clear the logged flag. */
TAILQ_FOREACH(ap, &state->addrs, next) {
	ap->flags &= ~IPV6_AF_DELEGATEDLOG;
}

that would guard against NULL state. This should ensure no crash in this case.

Sime-Zupanovic avatar Jul 23 '25 11:07 Sime-Zupanovic

We have delivered internally this patch from attachment

dhcp6_state_null_ignore.txt

Could you consider to have this patch in some coming official dhcpcd version?

Sime-Zupanovic avatar Aug 01 '25 10:08 Sime-Zupanovic

I'm not sure how that would fix it because there must be a DHCPv6 state when dhcp6_delegate_prefix is called.

rsmarples avatar Aug 06 '25 06:08 rsmarples

It seems that the point of failure here is SIGTERM handling, the process gets interrupted and dhcp6_freedrop gets called clearing the state data. Process returns to dhcp6_bind to finish its business but data it needs has been freed/removed. Either the function needs to be prepared that the state could change if signal received or the signal handling should be overhauled. Might be a signal to the event loop that clears data after the current function is done executing?

rmarioe avatar Aug 06 '25 13:08 rmarioe

DHCPv6 RELEASE needs to be re-worked. I am working on doing something for DHCPv4 which this can be based on.

rsmarples avatar Sep 02 '25 14:09 rsmarples

@Sime-Zupanovic does dhcpcd 10.3.0 fix the above issue?

perkelix avatar Nov 16 '25 11:11 perkelix

These issues were hard to reproduce as timing/racing problems, so we are testing new patches and I will try to see if company is willing to go on newer version as currently we are on v 10.2.2 and I was trying to test suff as patches on top of it.

Sime-Zupanovic avatar Nov 17 '25 08:11 Sime-Zupanovic