broker: cannot force unload a module
Problem: There is no way to force the unload of a misbehaving broker module. For example, in #6886 the cron module was stuck in loop in in cronodate_remaining(). flux module remove cron hangs indefinitely since the module never re-enters the reactor. @garlick was thinking that module threads are cancelled after a timeout, however this needs to be verified. Even so, a note in pthread_cancel(3) indicates that cancellation can only occur at a cancellation point by default.
We should do some testing to ensure the broker can force unload misbehaving modules if necessary. This may require setting pthread options to allow asynchronous cancellation, or may require pthread_kill(3).
I was mistaken. pthread_cancel() is only called on each broker module during broker shutdown, from modhash_destroy()
Hopefully the cron module in #6886 will pass through a cancellation point when it is finally restarted, so it can do so without hanging. It should since tzset() is in the loop it's stuck in and pthread(7) lists that as a cancellation point.
Just a thought but perhaps we could add a --force option to flux module remove that tells the broker to use pthread_cancel() instead of sending a shutdown request?
Edit: er, --force is already taken. Maybe --cancel?
That sounds like a reasonable improvement. Seems like we would not actually want to use pthread_setcanceltype(3) to set the cancellation type of module threads to asynchronous because of this note in the manpage:
Setting the cancelability type to PTHREAD_CANCEL_ASYNCHRONOUS is rarely useful. Since the thread could be canceled at any time, it cannot safely reserve resources (e.g., allocating memory with malloc(3)), acquire mutexes, semaphores, or locks, and so on.