Prevent processing of network event after SIGTERM executed
We detected root cause for most recent dhcpcd program crashes that happened on dhcpcd --exit resulting in handling SIGTERM. Basically problem was in eloop_start() where exitnow was checked to late if SIGTERM was process, resulting in entering eloop_run_ppoll() -> ppoll after stop_all_interfaces() already done.
int eloop_start(struct eloop *eloop, sigset_t *signals) { int error; struct eloop_timeout *t; struct timespec ts, *tsp;
assert(eloop != NULL);
#ifdef HAVE_KQUEUE UNUSED(signals); #endif
for (;;) {
**if (eloop->exitnow)
break;**
#ifndef HAVE_KQUEUE if (_eloop_nsig != 0) { int n = _eloop_sig[--_eloop_nsig];
if (eloop->signal_cb != NULL)
eloop->signal_cb(n, eloop->signal_cb_ctx);
continue;
}
#endif .. error = eloop_run_ppoll(eloop, tsp, signals);
As a consequence sometimes we would detect some network event over ppoll() and still call corresponding callback e.g. REPLY6 after SIGTERM already handled. We suggest moving this check:
**if (eloop->exitnow)
break;**
just below if (_eloop_nsig != 0)
So once SIGTERM callback dhcpcd_signal_cb() is done and we are back in eloop_start() we should just exit and prevent falling through in ppoll() call anymore.
OK, this is not the root cause, this is just a workaround.
But basically we need to deal with this properly elsewhere as DHCPv6 release (SIGALRM vs SIGTERM/SIGINT) means that we do expect an acknowledgement packet.
We will use your patch from https://github.com/NetworkConfiguration/dhcpcd/pull/536 with additional small fix in dhcpcd_handlecarrier() as explained in ticket 536.