dhcpcd segfaults during RELEASE when it tries to DISCOVER
Can you help me confirm, I understood your https://github.com/NetworkConfiguration/dhcpcd/pull/536/commits/d41c505981e1e8f493c544dff004a203fde42973#diff-66945e1e112a14eb5cb5931e2d600af419ed1f4c9a1567669f3de19d0f104c94
-
--exit sets STOPPING.
-
Each protocol sees STOPPING → calls *_release() → triggers *_deconfigure() → reason="RELEASE".
-
e.g. DHCPv6 RELEASE should be sent to network to signal removal of lease and we should receive ACK on dhcpcd
-
dhcp_free() and dhcpcd_dropped() run in correct order.
-
When all interfaces drop, the main loop exits cleanly.
If we don't have RELEASE ACK timeout callbacks (which also run from eloop) such as dhcp_discover, dhcp_start, dhcp_bind, dhcp_renew (dhcp.c) or dhcp6_* callbacks: dhcp6_start, dhcp6_recvif, dhcp6_freedrop, dhcp6_finishrelease will be triggered? I think that's how I saw mentioned dhcp_discover() crash
Program terminated with signal SIGSEGV, Segmentation fault.
#0 dhcp_discover (arg=0x5589e589a0) at dhcp.c:2006
2006 dhcp.c: No such file or directory.
(gdb) bt
#0 dhcp_discover (arg=0x5589e589a0) at dhcp.c:2006
#1 0x00000055757cce64 in eloop_start (eloop=0x5589e47830, signals=signals@entry=0x7fdf70e068) at eloop.c:1201
#2 0x00000055757c5ab4 in main (argc=
(gdb) bt full
#0 dhcp_discover (arg=0x5589e589a0) at dhcp.c:2006
ifp = 0x5589e589a0
state = 0x0
ifo = 0x5589e58bc0
#1 0x00000055757cce64 in eloop_start (eloop=0x5589e47830, signals=signals@entry=0x7fdf70e068) at eloop.c:1201
error =
I captured tcpdump during dhcp --exit, but I didn't saw DHCPv6 RELEASE? On which interface it would be sent?
I appended as well: https://github.com/NetworkConfiguration/dhcpcd/pull/536/commits/6c47ee771cabb1f0597bbf331426c2c3570b238f https://github.com/NetworkConfiguration/dhcpcd/pull/536/commits/7b15c54ddaa5c3b2672452ceff093770ee088a1f https://github.com/NetworkConfiguration/dhcpcd/pull/536/commits/ae22cf0599fcd2899474b5f676b48f8730199e52 https://github.com/NetworkConfiguration/dhcpcd/pull/536/commits/93305c28505601fda572912d97a9396b341f8909 https://github.com/NetworkConfiguration/dhcpcd/pull/536/commits/7ef3ba171c855f57df4626eee345f6a257da2d51
Please help.
Originally posted by @Sime-Zupanovic in https://github.com/NetworkConfiguration/dhcpcd/issues/536#issuecomment-3524131402
@Sime-Zupanovic So the idea is that dhcpcd guards protocol entry points with STOPPING. When an interface is stopped, dhcpcd_drop() is called and drops all protocols using dhcp_drop() and dhcp6_drop() etc. These functions should then start the release process. IF the release has not been started for the protocol for the interface then ALL timers for the interface and protocol are cleared and a new timer is started for the protocol and interface so we can timeout. So all discover, renew, rebind, expire timers should be removed.
NOTES: Only dhcp6 gets an ACK, dhcp does not and just times out. If dhcp6 does not get an ACK then the timeout is much longer than dhcp.
I appended as well:
So that we're on the same page, could you please use the git master branch rather than appending patches? Or if you must patch on top of 10.2.4, please apply a very big diff from 10.2.4 to the current tip of master as there is a lot of changes going on.
I'm having a hard time replicating this, but you appear to be able to easily. Can you please add these lines to the top of dhcpcd.conf?
debug
logfile /var/log/dhcpcd.log
restart dhcpcd, cause the crash Once done please put your full dhcpcd.conf and the logfile on this ticket.
Hi, unfortunately company politics is to use label versions. We are currently on v10.2.2, so this patching is making things hard. When will there be some new version available containing these dhcpcd gracefully shutdown stuff?
dhcpcd-10.3.0 is out now.
Ok, so complete new gracefull shutdown handling is in dhcpcd-10.3.0?
You've been asked to test the latest. 10.2.4 has been out for some time. Now we have 10.3.0 out. Why do you insist on testing against a heavily patched 10.2.2?
Ok, so complete new gracefull shutdown handling is in dhcpcd-10.3.0?
Yes!
Hi, this fault on Discover I cannot reproduce now. I moved to dhcpcd v10.3.0 and I see same issue I was seeing on patched 10.2.2 version on short interface breaks that result in dhcpcd -exit. Problem manifest in the way that main dhcpcd program hangs waiting on network proxy to close. network proxy seems to be still receiving RA packets and cannot close. So main dhcpcd is stuck in waiting to exit. I am not sure is this really a trigger, but I noticed that when fault appears we always have "carrier lost". When following code I think on "Carrier Lost" we have this sequence:
Carrier Lost → dhcpcd_handlecarrier() → dhcpcd_drop() → dhcp6_drop() → dhcp6_freedrop() → dhcpcd_dropped() → eloop_exit() → main() cleanup → ps_stop() → ps_root_stopprocesses() → ps_stopprocess()
In ps_stopprocess() we have: shutdown(psp->psp_fd, SHUT_WR). This will not give guarantee that child proxy fd is really close.
So when dhcpcd --exit is called we hang in "waiting for pid 24959 to exit"
See traces, debug is enabled:
Nov 26 18:22:53 kernel: port0: port 1(riport0.10) entered disabled state Nov 26 18:22:53 dhcpcd[24960]: port0: carrier lost Nov 26 18:22:53 pncdhcpa[15979]: RCV: 2 (NOCARRIER) Nov 26 18:22:53 dhcpcd[24960]: port0: executing: /usr/libexec/dhcpcd-run-hooks NOCARRIER Nov 26 18:22:53 dhcpcd[24960]: port0: deleting address fdaa:4ae4:f0e7:76::1/128 Nov 26 18:22:53 dhcpcd[24960]: port0: deleting default route via fe80::ae33:51ff:fe13:d556 Nov 26 18:22:53 dhcpcd[24960]: port0: executing: /usr/libexec/dhcpcd-run-hooks EXPIRE6 Nov 26 18:22:53 pncdhcpa[15982]: RCV: 19 (EXPIRE6) Nov 26 18:22:53 dhcpcd[24960]: port0: DHCP6 proxy fdaa:4ae4:f0e7:76::1 exited from PID 25026 Nov 26 18:22:53 dhcpcd[24960]: port0: deleting route to fdaa:4ae4:f0e7:76::/64 Nov 26 18:22:53 dhcpcd[24960]: port0: deleting route to fdaa:4ae4:f0e7:70::/64 Nov 26 18:22:53 dhcpcd[24960]: port0: deleting route to fdaa:4ae4:f0e7:64::/64 Nov 26 18:22:53 dhcpcd[24960]: port0: executing: /usr/libexec/dhcpcd-run-hooks ROUTERADVERT Nov 26 18:22:53 pncdhcpa[15997]: RCV: 22 (ROUTERADVERT) Nov 26 18:22:53 pncdhcpa[16003]: RCV: 18 (EXPIRE) Nov 26 18:22:53 dhcpcd[24960]: port0: executing: /usr/libexec/dhcpcd-run-hooks EXPIRE Nov 26 18:22:53 dhcpcd[24960]: port0: BPF BOOTP exited from PID 25107 Nov 26 18:22:55 kernel: port0: port 1(riport0.10) entered blocking state Nov 26 18:22:55 kernel: port0: port 1(riport0.10) entered disabled state Nov 26 18:22:55 kernel: riport0.10: entered allmulticast mode Nov 26 18:22:55 kernel: riport0: entered allmulticast mode Nov 26 18:22:55 kernel: extport0: entered allmulticast mode Nov 26 18:22:55 kernel: xilinx_axienet 80412000.bbm1a bbm1a: entered allmulticast mode Nov 26 18:22:55 kernel: riport0.10: entered promiscuous mode Nov 26 18:22:55 kernel: riport0: entered promiscuous mode Nov 26 18:22:55 kernel: extport0: entered promiscuous mode Nov 26 18:22:55 kernel: port0: port 1(riport0.10) entered blocking state Nov 26 18:22:55 kernel: port0: port 1(riport0.10) entered forwarding state Nov 26 18:22:55 dhcpcd[24960]: port0: carrier acquired Nov 26 18:22:55 dhcpcd[24960]: port0: executing: /usr/libexec/dhcpcd-run-hooks CARRIER Nov 26 18:22:55 pncdhcpa[16155]: RCV: 1 (CARRIER) Nov 26 18:22:55 dhcpcd[16154]: sending signal TERM to pid 24959 Nov 26 18:22:55 dhcpcd[16154]: waiting for pid 24959 to exit Nov 26 18:22:55 dhcpcd[24960]: port0: IAID 7f:90:46:95 Nov 26 18:22:55 dhcpcd[24960]: port0: delaying IPv6 router solicitation for 0.7 seconds Nov 26 18:22:55 dhcpcd[24960]: port0: delaying IPv4 for 1.4 seconds Nov 26 18:22:55 dhcpcd[24960]: received SIGTERM, stopping Nov 26 18:22:55 dhcpcd[24960]: port0: removing interface Nov 26 18:23:45 dhcpcd[16154]: pid 24959 failed to exit Nov 26 18:23:45 dhcpcd[23114]: sending signal TERM to pid 24959 Nov 26 18:23:45 dhcpcd[23114]: waiting for pid 24959 to exit Nov 26 18:23:45 dhcpcd[24960]: received SIGTERM, stopping Nov 26 18:24:35 dhcpcd[23114]: pid 24959 failed to exit Nov 26 18:24:35 dhcpcd[27751]: sending signal TERM to pid 24959 Nov 26 18:24:35 dhcpcd[27751]: waiting for pid 24959 to exit Nov 26 18:24:35 dhcpcd[24960]: received SIGTERM, stopping Nov 26 18:25:25 dhcpcd[27751]: pid 24959 failed to exit Nov 26 18:25:25 dhcpcd[32365]: sending signal TERM to pid 24959 Nov 26 18:25:25 dhcpcd[32365]: waiting for pid 24959 to exit Nov 26 18:25:25 dhcpcd[24960]: received SIGTERM, stopping Nov 26 18:26:15 dhcpcd[32365]: pid 24959 failed to exit Nov 26 18:26:15 dhcpcd[5435]: sending signal TERM to pid 24959 Nov 26 18:26:15 dhcpcd[5435]: waiting for pid 24959 to exit Nov 26 18:26:15 dhcpcd[24960]: received SIGTERM, stopping Nov 26 18:27:05 dhcpcd[5435]: pid 24959 failed to exit Nov 26 18:27:05 dhcpcd[11446]: sending signal TERM to pid 24959 Nov 26 18:27:05 dhcpcd[11446]: waiting for pid 24959 to exit Nov 26 18:27:05 dhcpcd[24960]: received SIGTERM, stopping Nov 26 18:27:55 dhcpcd[11446]: pid 24959 failed to exit Nov 26 18:27:55 dhcpcd[16058]: sending signal TERM to pid 24959 Nov 26 18:27:55 dhcpcd[16058]: waiting for pid 24959 to exit Nov 26 18:27:55 dhcpcd[24960]: received SIGTERM, stopping Nov 26 18:28:45 dhcpcd[16058]: pid 24959 failed to exit Nov 26 18:28:45 dhcpcd[20687]: sending signal TERM to pid 24959 Nov 26 18:28:45 dhcpcd[20687]: waiting for pid 24959 to exit Nov 26 18:28:45 dhcpcd[24960]: received SIGTERM, stopping
ps -ef | grep dhcpcd dhcpcd 705 1 0 18:16 ? 00:00:00 dhcpcd: eth0 [ip4] [ip6] root 706 705 0 18:16 ? 00:00:00 dhcpcd: [privileged proxy] eth0 [ip4] [ip6] dhcpcd 707 705 0 18:16 ? 00:00:00 dhcpcd: [network proxy] eth0 [ip4] [ip6] dhcpcd 708 705 0 18:16 ? 00:00:00 dhcpcd: [control proxy] eth0 [ip4] [ip6] dhcpcd 717 706 0 18:16 ? 00:00:00 dhcpcd: [DHCP6 proxy] fe80::5a70:7fff:fe90:4690 dhcpcd 719 706 0 18:16 ? 00:00:00 dhcpcd: [BPF ARP] eth0 214.14.21.44 dhcpcd 740 706 0 18:16 ? 00:00:00 dhcpcd: [BOOTP proxy] 214.14.21.44 dhcpcd 24959 1 0 18:19 ? 00:00:00 dhcpcd: port0 [ip4] [ip6] root 24960 24959 0 18:19 ? 00:00:00 dhcpcd: [privileged proxy] port0 [ip4] [ip6] dhcpcd 24961 24959 0 18:19 ? 00:00:00 dhcpcd: [network proxy] port0 [ip4] [ip6] dhcpcd 24962 24959 0 18:19 ? 00:00:00 dhcpcd: [control proxy] port0 [ip4] [ip6] dhcpcd 25027 24960 0 18:19 ? 00:00:00 dhcpcd: [DHCP6 proxy] fe80::5a70:7fff:fe90:4695 root 25308 5568 0 18:29 ? 00:00:00 /sbin/dhcpcd --exit port0 root 25703 4218 0 18:29 pts/0 00:00:00 grep dhcpcd
strace -f -p 24959 strace: Process 24959 attached epoll_pwait(6, [{events=EPOLLIN, data={u32=2490238448, u64=367562458608}}], 4, -1, [], 8) = 1 read(3, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65760) = 216 epoll_pwait(6, [{events=EPOLLIN, data={u32=2490238448, u64=367562458608}}], 4, -1, [], 8) = 1 read(3, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65760) = 216 epoll_pwait(6, [{events=EPOLLIN, data={u32=2490238448, u64=367562458608}}], 4, -1, [], 8) = 1 read(3, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65760) = 248 epoll_pwait(6, [{events=EPOLLIN, data={u32=2490238448, u64=367562458608}}], 4, -1, [], 8) = 1 read(3, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65760) = 248 epoll_pwait(6, [{events=EPOLLIN, data={u32=2490238448, u64=367562458608}}], 4, -1, [], 8) = 1 read(3, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65760) = 216 epoll_pwait(6, [{events=EPOLLIN, data={u32=2490238448, u64=367562458608}}], 4, -1, [], 8) = 1 read(3, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65760) = 248 epoll_pwait(6, ^Cstrace: Process 24959 detached <detached ...>
strace -f -p 24960 strace: Process 24960 attached epoll_pwait(7, ^Cstrace: Process 24960 detached <detached ...>
strace -f -p 24961 strace: Process 24961 attached epoll_pwait(3, [{events=EPOLLIN, data={u32=2490237888, u64=367562458048}}], 4, -1, [], 8) = 1 recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::ae33:51ff:fe13:d559", &sin6_addr), sin6_scope_id=if_nametoindex("riport1.10")}, msg_namelen=128 => 28, msg_iov=[{iov_base="\206\0\16\333@\310\0\22\0\0\0\0\0\0\0\0\3\4@\240\0\1Q\200\0\08@\0\0\0\0"..., iov_len=65536}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32, cmsg_data="\xff\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x35\x00\x00\x00"}, {cmsg_len=20, cmsg_level=SOL_IPV6, cmsg_type=0x34, cmsg_data="\xff\x00\x00\x00"}], msg_controllen=64, msg_flags=0}, MSG_WAITALL) = 88 sendmsg(9, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=64}, {iov_base="\n\0\0\0\0\0\0\0\376\200\0\0\0\0\0\0\2563Q\377\376\23\325Y5\0\0\0\0\0\0\0"..., iov_len=184}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_EOR) = 248 epoll_pwait(3, [{events=EPOLLIN, data={u32=2490237888, u64=367562458048}}], 4, -1, [], 8) = 1 recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::ae33:51ff:fe13:d559", &sin6_addr), sin6_scope_id=if_nametoindex("riport1")}, msg_namelen=128 => 28, msg_iov=[{iov_base="\206\0\214\237@\310\0\22\0\0\0\0\0\0\0\0\3\4@\240\0\1Q\200\0\08@\0\0\0\0"..., iov_len=65536}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32, cmsg_data="\xff\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x2d\x00\x00\x00"}, {cmsg_len=20, cmsg_level=SOL_IPV6, cmsg_type=0x34, cmsg_data="\xff\x00\x00\x00"}], msg_controllen=64, msg_flags=0}, MSG_WAITALL) = 56 sendmsg(9, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=64}, {iov_base="\n\0\0\0\0\0\0\0\376\200\0\0\0\0\0\0\2563Q\377\376\23\325Y-\0\0\0\0\0\0\0"..., iov_len=152}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_EOR) = 216 epoll_pwait(3, [{events=EPOLLIN, data={u32=2490237888, u64=367562458048}}], 4, -1, [], 8) = 1 recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::ae33:51ff:fe13:d556", &sin6_addr), sin6_scope_id=if_nametoindex("port0")}, msg_namelen=128 => 28, msg_iov=[{iov_base="\206\0\16\333@\310\0\22\0\0\0\0\0\0\0\0\3\4@\240\0\1Q\200\0\08@\0\0\0\0"..., iov_len=65536}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32, cmsg_data="\xff\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x33\x00\x00\x00"}, {cmsg_len=20, cmsg_level=SOL_IPV6, cmsg_type=0x34, cmsg_data="\xff\x00\x00\x00"}], msg_controllen=64, msg_flags=0}, MSG_WAITALL) = 88 sendmsg(9, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=64}, {iov_base="\n\0\0\0\0\0\0\0\376\200\0\0\0\0\0\0\2563Q\377\376\23\325V3\0\0\0\0\0\0\0"..., iov_len=184}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_EOR) = 248 epoll_pwait(3, [{events=EPOLLIN, data={u32=2490237888, u64=367562458048}}], 4, -1, [], 8) = 1 recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::ae33:51ff:fe13:d556", &sin6_addr), sin6_scope_id=if_nametoindex("riport0")}, msg_namelen=128 => 28, msg_iov=[{iov_base="\206\0\214\237@\310\0\22\0\0\0\0\0\0\0\0\3\4@\240\0\1Q\200\0\08@\0\0\0\0"..., iov_len=65536}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32, cmsg_data="\xff\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x2b\x00\x00\x00"}, {cmsg_len=20, cmsg_level=SOL_IPV6, cmsg_type=0x34, cmsg_data="\xff\x00\x00\x00"}], msg_controllen=64, msg_flags=0}, MSG_WAITALL) = 56 sendmsg(9, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=64}, {iov_base="\n\0\0\0\0\0\0\0\376\200\0\0\0\0\0\0\2563Q\377\376\23\325V+\0\0\0\0\0\0\0"..., iov_len=152}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_EOR) = 216 epoll_pwait(3, ^Cstrace: Process 24961 detached <detached ...>
strace -f -p 24962 strace: Process 24962 attached epoll_pwait(7, ^Cstrace: Process 24962 detached <detached ...>
strace -f -p 25027 strace: Process 25027 attached epoll_pwait(3, ^Cstrace: Process 25027 detached <detached ...>
Now when checking code further, I have suspect that problem with hang is because main dhcpcd process is not exiting its main loop? So we don't call ps_stop() at all. Think we are returning somewhere before dhcpcd_dropped() reaches eloop_exit(ctx->eloop, EXIT_SUCCESS);?
This is newest logs:
Nov 27 17:42:52 air6449_mib1_1 dhcpcd[8094]: port0: executing: /usr/libexec/dhcpcd-run-hooks ROUTERADVERT Nov 27 17:42:52 air6449_mib1_1 pncdhcpa[2486]: RCV: 22 (ROUTERADVERT) Nov 27 17:42:52 air6449_mib1_1 dhcpcd[8094]: port0: reading lease: /var/lib/dhcpcd/port0.lease6 Nov 27 17:42:52 air6449_mib1_1 dhcpcd[8094]: port0: confirming prior DHCPv6 lease Nov 27 17:42:52 air6449_mib1_1 dhcpcd[8094]: port0: delaying CONFIRM6 (xid 0x0318d3), next in 1.1 seconds Nov 27 17:42:53 air6449_mib1_1 dhcpcd[2511]: sending signal TERM to pid 8093 Nov 27 17:42:53 air6449_mib1_1 dhcpcd[2511]: waiting for pid 8093 to exit Nov 27 17:42:53 air6449_mib1_1 dhcpcd[8094]: received SIGTERM, stopping Nov 27 17:42:53 air6449_mib1_1 dhcpcd[8094]: port0: removing interface Nov 27 17:42:53 air6449_mib1_1 dhcpcd[8094]: dhcpcd_drop_af: calling dhcp6_drop Nov 27 17:42:53 air6449_mib1_1 dhcpcd[8094]: dhcp6_freedrop: drop 1 Nov 27 17:42:53 air6449_mib1_1 dhcpcd[8094]: dhcp6_freedrop: state 6 options 310627623444699145 Nov 27 17:42:53 air6449_mib1_1 dhcpcd[8094]: port0: ipv6_freedrop_addrs drop 1 notflags 0 Nov 27 17:42:53 air6449_mib1_1 dhcpcd[8094]: dhcpcd_drop_af: calling ipv6nd_drop
Nov 27 17:43:34 air6449_mib1_1 dhcpcd[8094]: port0: pid 0 deleted route to fd1a:3e8e:668d:64::/64 Nov 27 17:43:43 air6449_mib1_1 dhcpcd[2511]: pid 8093 failed to exit Nov 27 17:43:43 air6449_mib1_1 dhcpcd[2511]: enter ps_stop Nov 27 17:43:43 air6449_mib1_1 dhcpcd[2511]: enter ps_stopwait Nov 27 17:43:43 air6449_mib1_1 dhcpcd[15212]: sending signal TERM to pid 8093 Nov 27 17:43:43 air6449_mib1_1 dhcpcd[15212]: waiting for pid 8093 to exit
ps -ef | grep dhcpcd dhcpcd 705 1 0 17:32 ? 00:00:00 dhcpcd: eth0 [ip4] [ip6] root 706 705 0 17:32 ? 00:00:00 dhcpcd: [privileged proxy] eth0 [ip4] [ip6] dhcpcd 707 705 0 17:32 ? 00:00:00 dhcpcd: [network proxy] eth0 [ip4] [ip6] dhcpcd 708 705 0 17:32 ? 00:00:00 dhcpcd: [control proxy] eth0 [ip4] [ip6] dhcpcd 717 706 0 17:32 ? 00:00:00 dhcpcd: [DHCP6 proxy] fe80::5a70:7fff:fe90:4690 dhcpcd 719 706 0 17:32 ? 00:00:00 dhcpcd: [BPF ARP] eth0 214.14.21.44 dhcpcd 740 706 0 17:32 ? 00:00:00 dhcpcd: [BOOTP proxy] 214.14.21.44 root 3684 31128 0 18:03 pts/1 00:00:00 grep dhcpcd dhcpcd 8093 1 0 17:32 ? 00:00:00 dhcpcd: port0 [ip4] [ip6] root 8094 8093 0 17:32 ? 00:00:00 dhcpcd: [privileged proxy] port0 [ip4] [ip6] dhcpcd 8095 8093 0 17:32 ? 00:00:00 dhcpcd: [network proxy] port0 [ip4] [ip6] dhcpcd 8096 8093 0 17:32 ? 00:00:00 dhcpcd: [control proxy] port0 [ip4] [ip6] dhcpcd 8157 8094 0 17:32 ? 00:00:00 dhcpcd: [DHCP6 proxy] fe80::5a70:7fff:fe90:4695
I think problem is in dhcpcd_dropped() which was called from stop_all_interfaces() on SIGTERM receive. We are returning here:
if (ifp->options == NULL ||
!(ifp->options->options & DHCPCD_STOPPING) ||
dhcpcd_ifrunning(ifp))
return;
as dhcpcd_ifrunning() returns true. Because network proxy is still receiving some RA
if (D_CSTATE(ifp) != NULL)
return true;
So it is expected not to reach exit in this case.
if (ifp == NULL){
loginfox("dhcpcd_dropped: exit");
eloop_exit(ctx->eloop, EXIT_SUCCESS);
}
Not sure how was this use case conceived in this scenario on dhcpcd 10.3.0? As we don't have release enabled so we will not even send release packet to network and get some reply. As we are not using release currently could we do:
if (ctx->options & DHCPCD_EXITING) { if (!(ctx->options & DHCPCD_RELEASE)) { loginfox("dhcpcd_dropped: exiting without RELEASE; ignoring ifrunning"); goto force_exit; } }
/* Old logic */ if (ifp->options == NULL || !(ifp->options->options & DHCPCD_STOPPING) || dhcpcd_ifrunning(ifp)) return;
/* normal exit path */ ... force_exit: eloop_exit(ctx->eloop, EXIT_SUCCESS);
to avoid parent process waiting on proxy exit for ever, as stop was not requested.
I got crash on dhcpcd v10.3.0 after using --release instead --exit on stop. From backtrace it crashes in if_findifpfromcmsg() when accessing ctx->ifaces when ifaces == null.
Core was generated by `dhcpcd: port0 [ip4] [ip6] '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000555fe45664 in if_findifpfromcmsg (ctx=ctx@entry=0x7fc240e8a0, msg=msg@entry=0x7fc23fe630, hoplimit=hoplimit@entry=0x0) at if.c:941
941 if.c: No such file or directory.
(gdb) bt
#0 0x000000555fe45664 in if_findifpfromcmsg (ctx=ctx@entry=0x7fc240e8a0, msg=msg@entry=0x7fc23fe630, hoplimit=hoplimit@entry=0x0) at if.c:941
#1 0x000000555fe6ccdc in dhcp6_recvmsg (ctx=0x7fc240e8a0, msg=0x7fc23fe630, ia=ia@entry=0x0) at dhcp6.c:3769
#2 0x000000555fe73560 in ps_inet_dispatch (arg=
(gdb) bt full
#0 0x000000555fe45664 in if_findifpfromcmsg (ctx=ctx@entry=0x7fc240e8a0, msg=msg@entry=0x7fc23fe630, hoplimit=hoplimit@entry=0x0) at if.c:941
cm =
(gdb) thread apply all bt
Thread 1 (LWP 7824):
#0 0x000000555fe45664 in if_findifpfromcmsg (ctx=ctx@entry=0x7fc240e8a0, msg=msg@entry=0x7fc23fe630, hoplimit=hoplimit@entry=0x0) at if.c:941
#1 0x000000555fe6ccdc in dhcp6_recvmsg (ctx=0x7fc240e8a0, msg=0x7fc23fe630, ia=ia@entry=0x0) at dhcp6.c:3769
#2 0x000000555fe73560 in ps_inet_dispatch (arg=
crash is happens as ctx->ifaces = 0x0,
(gdb) p *ctx
$1 = {pidfile = "/run/dhcpcd/port0.pid", '\000' <repeats 13 times>, vendor = "dhcpcd-10.3.0", '\000' <repeats 185 times>, fork_fd = -1,
cffile = 0x557b919360 "/etc/pnc-dhcpcd.conf", options = 310344232912603151, logfile = 0x0, argc = 6, argv = 0x7fc240ee68, ifac = 0, ifav = 0x0, ifdc = 0, ifdv = 0x0, ifc = 1, ifv = 0x7fc240ee90, ifcc = 0, ifcv = 0x0,
duid_type = 0 '\000', duid = 0x557b938330 "x0\304~U", duid_len = 16, ifaces = 0x0, ctl_buf = 0x0, ctl_buflen = 0, ctl_bufpos = 0, ctl_extra = 0, routes = {rbt_root = 0x0, rbt_ops = 0x555fe904a0 <rt_compare_os_ops>,
rbt_minmax = {0x7fc240ea78, 0x7fc240ea78}}, froutes = {rbt_root = 0x0, rbt_ops = 0x555fe904c0 <rt_compare_free_ops>, rbt_minmax = {0x7fc240ea98, 0x7fc240ea98}}, rt_order = 0, pf_inet_fd = -1, priv = 0x0, link_fd = -1,
link_rcvbuf = 0, seq = 703, sseq = 0, eloop = 0x557b92e8a0, script = 0x555fe79fe0 "/usr/libexec/dhcpcd-run-hooks", script_fp = 0x557b932480, script_buf = 0x557b953a50 "3\037\306~U", script_buflen = 276,
script_env = 0x557b93ab40, script_envlen = 40, control_fd = -1, control_unpriv_fd = -1, control_fds = {tqh_first = 0x0, tqh_last = 0x7fc240eb20}, control_sock = "/run/dhcpcd/port0.sock", '\000' <repeats 14 times>,
control_sock_unpriv = "/run/dhcpcd/port0.unpriv.sock", '\000' <repeats 14 times>, control_group = 0, vivso = 0x0, vivso_len = 0, randomstate = 0x0, ps_user = 0x7f9b956668
but this seems to happen based on syslog after exit by release on new start, why?
Dec 2 14:06:18 dhcpcd[7825]: received SIGALRM, releasing Dec 2 14:06:18 dhcpcd[7825]: port0: removing interface Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_drop_af: calling dhcp6_drop Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 1 Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: state 3 options 310627623444695051 Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: calling startrelease Dec 2 14:06:18 dhcpcd[7825]: port0: multicasting RELEASE6 (xid 0x143dd8), next in 1.0 seconds Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_drop_af: calling ipv6nd_drop Dec 2 14:06:18 dhcpcd[7825]: port0: deleting route to fd10:23cb:304b:76::/64 Dec 2 14:06:18 dhcpcd[7825]: port0: deleting route to fd10:23cb:304b:70::/64 Dec 2 14:06:18 dhcpcd[7825]: port0: deleting route to fd10:23cb:304b:64::/64 Dec 2 14:06:18 dhcpcd[7825]: port0: deleting default route via fe80::ae33:51ff:fe13:d559 Dec 2 14:06:18 dhcpcd[7825]: port0: executing: /usr/libexec/dhcpcd-run-hooks ROUTERADVERT Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: port0: D_CSTATE != NULL Dec 2 14:06:18 dhcpcd[7825]: port0: executing: /usr/libexec/dhcpcd-run-hooks STOP Dec 2 14:06:18 dhcpcd[7825]: port0: executing: /usr/libexec/dhcpcd-run-hooks RELEASE Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: port0: D6_CSTATE != NULL Dec 2 14:06:18 dhcpcd[7825]: port0: fe80::ae33:51ff:fe13:d559 acknowledged RELEASE6 Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 1 Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: state 14 options 310627623444695051 Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: calling dhcp_unlink Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: cleaning started Dec 2 14:06:18 dhcpcd[7825]: port0: ipv6_freedrop_addrs drop 1 notflags 0 Dec 2 14:06:18 dhcpcd[7825]: port0: deleting address fd10:23cb:304b:76::1/128 Dec 2 14:06:18 dhcpcd[7825]: port0: executing: /usr/libexec/dhcpcd-run-hooks RELEASE6 Dec 2 14:06:18 dhcpcd[7825]: enter ps_stop Dec 2 14:06:18 dhcpcd[7825]: enter ps_stopwait Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: cleaned all Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_ifrunning: returns false Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: active 2 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: stopping Dec 2 14:06:18 dhcpcd[7825]: port0: executing: /usr/libexec/dhcpcd-run-hooks STOPPED Dec 2 14:06:18 dhcpcd[7825]: enter ps_stop Dec 2 14:06:18 dhcpcd[7825]: enter ps_stopwait Dec 2 14:06:18 dhcpcd[7825]: port0: BPF BOOTP exited from PID 8058 Dec 2 14:06:18 dhcpcd[7825]: port0: DHCP6 proxy fd10:23cb:304b:76::1 exited from PID 8596 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: active 310344232912603151 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: 2. active 310344232912603151 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_ifrunning: returns false Dec 2 14:06:18 last message buffered 22 times Dec 2 14:06:18 dhcpcd[7825]: enter ps_stopwait Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_ifrunning: returns false Dec 2 14:06:18 last message buffered 31 times Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: exit Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: not exit Dec 2 14:06:18 dhcpcd[7825]: enter ps_stop Dec 2 14:06:18 dhcpcd[7825]: ps_ctl_stop 0 Dec 2 14:06:18 dhcpcd[7825]: ps_inet_stop 0 Dec 2 14:06:18 dhcpcd[7825]: ps_root_stopprocesses Dec 2 14:06:18 dhcpcd[7825]: enter ps_stop Dec 2 14:06:18 last message buffered 1 times Dec 2 14:06:18 dhcpcd[7825]: enter ps_stopwait Dec 2 14:06:18 last message buffered 1 times Dec 2 14:06:18 dhcpcd[7825]: enter ps_stop Dec 2 14:06:18 dhcpcd[7825]: enter ps_stopwait Dec 2 14:06:18 dhcpcd[7825]: port0: DHCP6 proxy fe80::5a70:7fff:fe90:4695 exited from PID 7909 Dec 2 14:06:18 dhcpcd[7825]: enter ps_stop Dec 2 14:06:18 dhcpcd[7825]: enter ps_stopwait Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: dhcp6_freedrop: drop 0 Dec 2 14:06:18 dhcpcd[7825]: dhcpcd_dropped: Dec 2 14:06:18 dhcpcd[7825]: enter ps_stopwait Dec 2 14:06:18 dhcpcd[21899]: enter ps_stop Dec 2 14:06:18 dhcpcd[21899]: enter ps_stopwait Dec 2 14:06:20 dhcpcd[22321]: dhcpcd-10.3.0 starting Dec 2 14:06:20 dhcpcd[22321]: chrooting as dhcpcd to /var/lib/dhcpcd Dec 2 14:06:20 dhcpcd[22323]: enter ps_stop Dec 2 14:06:20 dhcpcd[22326]: spawned manager process on PID 22326 Dec 2 14:06:20 dhcpcd[22323]: enter ps_stopwait Dec 2 14:06:20 dhcpcd[22327]: spawned privileged proxy on PID 22327 Dec 2 14:06:20 dhcpcd[22327]: spawned network proxy on PID 22328 Dec 2 14:06:20 dhcpcd[22327]: spawned controller proxy on PID 22329 Dec 2 14:06:20 dhcpcd[22327]: DUID 00:02:00:00:00:c1:41:34:30:33:30:35:37:32:35:35 Dec 2 14:06:20 dhcpcd[22327]: sandbox unavailable: seccomp Dec 2 14:06:20 dhcpcd[22327]: port0: spawned DHCP6 proxy fe80::5a70:7fff:fe90:4695 on PID 22421 Dec 2 14:06:20 dhcpcd[22327]: port0: executing: /usr/libexec/dhcpcd-run-hooks PREINIT Dec 2 14:06:20 pncdhcpa[22428]: RCV: 0 (PREINIT) Dec 2 14:06:20 dhcpcd[22327]: port0: executing: /usr/libexec/dhcpcd-run-hooks CARRIER Dec 2 14:06:20 pncdhcpa[22433]: RCV: 1 (CARRIER) Dec 2 14:06:20 dhcpcd[22327]: port0: IAID 7f:90:46:95 Dec 2 14:06:20 dhcpcd[22327]: port0: delaying IPv6 router solicitation for 0.5 seconds Dec 2 14:06:20 dhcpcd[22327]: port0: delaying IPv4 for 1.1 seconds
by checking code it looks like ctx->ifaces are removed before stopping ps_inet. So, we still end up in receiving late IPv6 packets. I just wanted to try this release option, but in our system this is not used still and it would introduce 1 second release timeout that it needs to be avoided. Because we still have sequences that can call frequent dhcpcd stop/start. And in release option dhcpcd will not exit until timeout callback fires. I am thinking in approach if we are NOT doing a DHCPv4/v6 RELEASE, to stop privsep proxy FIRST.
Like:
if (!(ctx->options & DHCPCD_RELEASE)) { loginfox("EXITING without RELEASE: stopping privsep early"); ps_stop(ctx); }
stop_all_interfaces(ctx);
Could you test by not specifying any interface on the dhcpcd command line please when starting and stopping? This puts dhcpcd into manager mode where you don't need any per address proxy processes.
Hi,
We cannot test with “no interfaces on the command line” because on our platform we rely on dhcpcd’s per-interface address proxy processes (DHCP6 proxy, BOOTP BPF, etc.) as part of our architecture. Running dhcpcd in manager mode (no interfaces specified) disables these per-interface processes, so the behaviour will not match our actual system.
I seem to have fixed dhcpcd hanging problem with adding ps_stop() before stop_all_interfaces() in dhcpcd v10.3.0 baseline:
static void @@ -1572,6 +1589,10 @@ dhcpcd_signal_cb(int sig, void *arg) return;
ctx->options |= DHCPCD_EXITING;
- if (!(ctx->options & DHCPCD_RELEASE)) {
-
loginfox("EXITING without RELEASE: stopping privsep early"); -
ps_stop(ctx); - } if (!(ctx->options & DHCPCD_TEST) && stop_all_interfaces(ctx, opts)) {
and static void @@ -1572,6 +1589,10 @@ dhcpcd_signal_cb(int sig, void *arg) return;
ctx->options |= DHCPCD_EXITING;
- if (!(ctx->options & DHCPCD_RELEASE)) {
-
loginfox("EXITING without RELEASE: stopping privsep early"); -
ps_stop(ctx); - } if (!(ctx->options & DHCPCD_TEST) && stop_all_interfaces(ctx, opts)) {
I get these traces now:
Dec 3 11:03:52 dhcpcd[20014]: waiting for pid 19003 to exit Dec 3 11:03:52 dhcpcd[20014]: sending signal TERM to pid 19003 Dec 3 11:03:52 dhcpcd[19004]: received SIGTERM, stopping Dec 3 11:03:52 dhcpcd[19004]: EXITING without RELEASE: stopping privsep early Dec 3 11:03:52 dhcpcd[19004]: enter ps_stopwait Dec 3 11:03:52 dhcpcd[19004]: enter ps_stop Dec 3 11:03:52 dhcpcd[19004]: ps_ctl_stop 0 Dec 3 11:03:52 dhcpcd[19004]: ps_inet_stop 0 Dec 3 11:03:52 dhcpcd[19004]: ps_root_stopprocesses Dec 3 11:03:52 dhcpcd[19004]: enter ps_stop Dec 3 11:03:52 last message buffered 1 times Dec 3 11:03:52 dhcpcd[19004]: enter ps_stopwait Dec 3 11:03:52 last message buffered 1 times Dec 3 11:03:52 dhcpcd[19004]: enter ps_stop Dec 3 11:03:52 dhcpcd[19004]: enter ps_stopwait Dec 3 11:03:52 dhcpcd[19004]: ribridge0: DHCP6 proxy fe80::5a70:7fff:fe90:4695 exited from PID 19084 Dec 3 11:03:52 dhcpcd[19004]: enter ps_stop Dec 3 11:03:52 dhcpcd[19004]: enter ps_stopwait Dec 3 11:03:52 dhcpcd[19004]: ribridge0: BPF BOOTP exited from PID 19941 Dec 3 11:03:52 dhcpcd[19004]: enter ps_stop Dec 3 11:03:52 dhcpcd[19004]: enter ps_stopwait Dec 3 11:03:52 dhcpcd[19004]: ribridge0: removing interface Dec 3 11:03:52 dhcpcd[19004]: dhcpcd_drop_af: calling dhcp6_drop Dec 3 11:03:52 dhcpcd[19004]: dhcp6_freedrop: drop 1 Dec 3 11:03:52 dhcpcd[19004]: dhcp6_freedrop: state 1 options 310627623444699145 Dec 3 11:03:52 dhcpcd[19004]: dhcp6_freedrop: cleaning started Dec 3 11:03:52 dhcpcd[19004]: ribridge0: ipv6_freedrop_addrs drop 1 notflags 0 Dec 3 11:03:52 dhcpcd[19004]: dhcp6_freedrop: cleaned all Dec 3 11:03:52 dhcpcd[19004]: dhcpcd_dropped: Dec 3 11:03:52 dhcpcd[19004]: ribridge0: D_CSTATE != NULL Dec 3 11:03:52 dhcpcd[19004]: dhcpcd_drop_af: calling ipv6nd_drop Dec 3 11:03:52 dhcpcd[20014]: enter ps_stop Dec 3 11:03:52 dhcpcd[20014]: enter ps_stopwait Dec 3 11:03:52 dhcpcd[20024]: dhcpcd-10.3.0 starting
With this as approach I think we assure no more inbound packets would be processed during shutdown. As we don't need release feature for now, we can safely close sockets on proxy processes. Nothing should enters dhcpcd from outside anymore. Before I had problem that ps_inet stayed alive and still was receiving something on its fd. And dhcpcd main was not exiting eloop to call ps_stop() as there was no trigger in case release is not used. Using --release from our side currently is problematic as slows down shutdown every signal time as we would wait for 1 sec timeout callback to fire. Continue testing above.
I have this patch on dhcpcd 10.3.0 baseline with using --release from client program.
- static void +@@ -1572,6 +1589,10 @@ dhcpcd_signal_cb(int sig, void *arg)
-
return; - ctx->options |= DHCPCD_EXITING; ++ if (!(opts & DHCPCD_RELEASE)) { ++ loginfox("EXITING without RELEASE: stopping privsep early"); ++ ps_stop(ctx); ++ }
- if (!(ctx->options & DHCPCD_TEST) &&
-
stop_all_interfaces(ctx, opts)) - { +@@ -1720,6 +1741,10 @@ dumperr:
- if (opts & (DHCPCD_EXITING | DHCPCD_RELEASE)) {
-
if (oifind == argc && af == AF_UNSPEC) { -
ctx->options |= DHCPCD_EXITING;
++ if ((opts & DHCPCD_EXITING) && !(opts & DHCPCD_RELEASE)) { ++ loginfox("EXITING without RELEASE: stopping privsep early"); ++ ps_stop(ctx); ++ }
-
if (stop_all_interfaces(ctx, opts) == false) -
eloop_exit(ctx->eloop, EXIT_SUCCESS); -
/* We did stop an interface, it will notify us once
+diff --git a/src/if.c b/src/if.c +index 9b5ac2b..3aaec14 100644 +--- a/src/if.c ++++ b/src/if.c +@@ -936,6 +936,11 @@ if_findifpfromcmsg(struct dhcpcd_ctx *ctx, struct msghdr *msg, int *hoplimit)
-
} - #endif
- } ++ if (ctx == NULL || ctx->ifaces == NULL) { ++ errno = ENXIO; ++ loginfox("if_findifpfromcmsg: ifaces is NULL"); ++ return NULL; ++ }
I managed to avoid mentioned if_findifpfromcmsg() crash with if ctx->ifaces guard above. And as you can see release packet was sent and reply received. But dhcpcd main stucks as dhcpcd_ifrunning() keeps returning true and can not reach eloop_exit(). Additional tracing confirms we return true here:
if (D_CSTATE(ifp) != NULL){
loginfox("%s: D_CSTATE != NULL", ifp->name);
return true;
}
And even if release option is used we can end up in scenario where network proxy pid=9366 is still receiving something on it's fd. Based on strace it seems to be ICMPv6 RA what keeps network proxy busy. Leading to deadlock where network proxy is not stopped and still receives RA, while dhcpcd main waits on network proxy to say it is done to order stop. This test confirms this is racing issue between main dhcpcd and network proxy not connected to release option.
Dec 8 13:09:51 dhcpcd[7523]: sending signal ALRM to pid 9364 Dec 8 13:09:51 dhcpcd[7523]: waiting for pid 9364 to exit Dec 8 13:09:51 pncdhcpa[7524]: RCV: 22 (ROUTERADVERT) Dec 8 13:09:51 dhcpcd[9365]: potrt0: reading lease: /var/lib/dhcpcd/potrt0.lease6 Dec 8 13:09:51 dhcpcd[9365]: potrt0: confirming prior DHCPv6 lease Dec 8 13:09:51 dhcpcd[9365]: potrt0: delaying CONFIRM6 (xid 0xd3ab5a), next in 1.0 seconds Dec 8 13:09:51 dhcpcd[9365]: received SIGALRM, releasing Dec 8 13:09:51 dhcpcd[9365]: potrt0: removing interface Dec 8 13:09:51 dhcpcd[9365]: dhcpcd_drop_af: calling dhcp6_drop Dec 8 13:09:51 dhcpcd[9365]: dhcp6_freedrop: drop 1 Dec 8 13:09:51 dhcpcd[9365]: dhcp6_freedrop: state 6 options 310627623444695051 Dec 8 13:09:51 dhcpcd[9365]: dhcp6_freedrop: calling startrelease Dec 8 13:09:51 dhcpcd[9365]: dhcp6_freedrop: drop 1 Dec 8 13:09:51 dhcpcd[9365]: dhcp6_freedrop: state 14 options 310627623444695051 Dec 8 13:09:51 dhcpcd[9365]: dhcp6_freedrop: calling dhcp_unlink Dec 8 13:09:51 dhcpcd[9365]: dhcp6_freedrop: cleaning started Dec 8 13:09:51 dhcpcd[9365]: potrt0: ipv6_freedrop_addrs drop 1 notflags 0 Dec 8 13:09:51 dhcpcd[9365]: potrt0: executing: /usr/libexec/dhcpcd-run-hooks RELEASE6 Dec 8 13:09:51 dhcpcd[9365]: dhcp6_freedrop: cleaned all Dec 8 13:09:51 dhcpcd[9365]: dhcpcd_dropped: Dec 8 13:09:51 dhcpcd[9365]: potrt0: D_CSTATE != NULL Dec 8 13:09:51 dhcpcd[9365]: dhcpcd_drop_af: calling ipv6nd_drop Dec 8 13:09:51 dhcpcd[9365]: potrt0: deleting route to fd9a:f105:37d3:76::/64 Dec 8 13:09:51 dhcpcd[9365]: potrt0: deleting route to fd9a:f105:37d3:64::/64
Dec 8 13:09:51 dhcpcd[9365]: potrt0: executing: /usr/libexec/dhcpcd-run-hooks ROUTERADVERT Dec 8 13:09:51 pncdhcpa[7529]: RCV: 22 (ROUTERADVERT) Dec 8 13:09:51 dhcpcd[9365]: dhcpcd_dropped: Dec 8 13:09:51 dhcpcd[9365]: potrt0: D_CSTATE != NULL Dec 8 13:09:51 dhcpcd[9365]: dhcpcd_dropped: Dec 8 13:09:51 dhcpcd[9365]: potrt0: D_CSTATE != NULL
Dec 8 13:10:41 dhcpcd[7523]: pid 9364 failed to exit Dec 8 13:10:41 dhcpcd[7523]: enter ps_stop Dec 8 13:10:41 dhcpcd[7523]: enter ps_stopwait Dec 8 13:10:42 dhcpcd[18252]: sending signal ALRM to pid 9364 Dec 8 13:10:42 dhcpcd[18252]: waiting for pid 9364 to exit
root@:~# ps -ef | grep dhcpcd dhcpcd 705 1 0 11:06 ? 00:00:00 dhcpcd: eth0 [ip4] [ip6] root 706 705 0 11:06 ? 00:00:00 dhcpcd: [privileged proxy] eth0 [ip4] [ip6] dhcpcd 707 705 0 11:06 ? 00:00:00 dhcpcd: [network proxy] eth0 [ip4] [ip6] dhcpcd 708 705 0 11:06 ? 00:00:00 dhcpcd: [control proxy] eth0 [ip4] [ip6] dhcpcd 717 706 0 11:06 ? 00:00:00 dhcpcd: [BPF ARP] eth0 214.14.21.44 dhcpcd 718 706 0 11:06 ? 00:00:00 dhcpcd: [DHCP6 proxy] fe80::5a70:7fff:fe90:4690 dhcpcd 739 706 0 11:06 ? 00:00:00 dhcpcd: [BOOTP proxy] 214.14.21.44 dhcpcd 9364 1 0 13:04 ? 00:00:00 dhcpcd: potrt0 [ip4] [ip6] root 9365 9364 0 13:04 ? 00:00:00 dhcpcd: [privileged proxy] potrt0 [ip4] [ip6] dhcpcd 9366 9364 0 13:04 ? 00:00:00 dhcpcd: [network proxy] potrt0 [ip4] [ip6] dhcpcd 9367 9364 0 13:04 ? 00:00:00 dhcpcd: [control proxy] potrt0 [ip4] [ip6] dhcpcd 9435 9365 0 13:04 ? 00:00:00 dhcpcd: [DHCP6 proxy] fe80::5a70:7fff:fe90:4695 root 20970 5596 0 13:16 ? 00:00:00 /sbin/dhcpcd --release potrt0 root 24537 9766 0 13:17 pts/2 00:00:00 grep dhcpcd
root@:~# strace -f -p 9366 strace: Process 9366 attached epoll_pwait(3, [{events=EPOLLIN, data={u32=1557173184, u64=366629393344}}], 4, -1, [], 8) = 1 recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::ae33:51ff:fe13:d559", &sin6_addr), sin6_scope_id=if_nametoindex("riport1")}, msg_namelen=128 => 28, msg_iov=[{iov_base="\206\0\237\242@\310\0\22\0\0\0\0\0\0\0\0\3\4@\240\0\1Q\200\0\08@\0\0\0\0"..., iov_len=65536}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32, cmsg_data="\xff\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x2c\x00\x00\x00"}, {cmsg_len=20, cmsg_level=SOL_IPV6, cmsg_type=0x34, cmsg_data="\xff\x00\x00\x00"}], msg_controllen=64, msg_flags=0}, MSG_WAITALL) = 56 sendmsg(9, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=64}, {iov_base="\n\0\0\0\0\0\0\0\376\200\0\0\0\0\0\0\2563Q\377\376\23\325Y,\0\0\0\0\0\0\0"..., iov_len=152}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_EOR) = 216 epoll_pwait(3, [{events=EPOLLIN, data={u32=1557173184, u64=366629393344}}], 4, -1, [], 8) = 1 recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::ae33:51ff:fe13:d556", &sin6_addr), sin6_scope_id=if_nametoindex("riport0")}, msg_namelen=128 => 28, msg_iov=[{iov_base="\206\0\237\242@\310\0\22\0\0\0\0\0\0\0\0\3\4@\240\0\1Q\200\0\08@\0\0\0\0"..., iov_len=65536}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32, cmsg_data="\xff\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x2a\x00\x00\x00"}, {cmsg_len=20, cmsg_level=SOL_IPV6, cmsg_type=0x34, cmsg_data="\xff\x00\x00\x00"}], msg_controllen=64, msg_flags=0}, MSG_WAITALL) = 56 sendmsg(9, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=64}, {iov_base="\n\0\0\0\0\0\0\0\376\200\0\0\0\0\0\0\2563Q\377\376\23\325V*\0\0\0\0\0\0\0"..., iov_len=152}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_EOR) = 216 epoll_pwait(3, [{events=EPOLLIN, data={u32=1557173184, u64=366629393344}}], 4, -1, [], 8) = 1 recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::ae33:51ff:fe13:d559", &sin6_addr), sin6_scope_id=if_nametoindex("riport1.10")}, msg_namelen=128 => 28, msg_iov=[{iov_base="\206\0004\341@\310\0\22\0\0\0\0\0\0\0\0\3\4@\240\0\1Q\200\0\08@\0\0\0\0"..., iov_len=65536}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32, cmsg_data="\xff\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x33\x00\x00\x00"}, {cmsg_len=20, cmsg_level=SOL_IPV6, cmsg_type=0x34, cmsg_data="\xff\x00\x00\x00"}], msg_controllen=64, msg_flags=0}, MSG_WAITALL) = 88 sendmsg(9, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=64}, {iov_base="\n\0\0\0\0\0\0\0\376\200\0\0\0\0\0\0\2563Q\377\376\23\325Y3\0\0\0\0\0\0\0"..., iov_len=184}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_EOR) = 248 epoll_pwait(3, ^Cstrace: Process 9366 detached
root@:~# ls -l /proc/9366/fd/5 lrwx------ 1 root root 64 Dec 8 13:41 /proc/9366/fd/5 -> 'socket:[28366]'
What is what in this output: root@:~# grep 28366 /proc/net/{unix,tcp,tcp6,udp,udp6,raw,raw6,packet,netlink} /proc/net/raw6: 68: 00000000000000000000000000000000:003A 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 28366 2 00000000955ccf89 0
To summarize, I think option to early order stop of proxy child processes before interface remove should help in case no release option is used.
For release option maybe to add in dhcp6_freedrop() after RELEASE reply/timout is done? if (state->state == DH6S_RELEASED) { loginfox("RELEASE done: stopping privsep now"); ps_stop(ctx); }