libraft
libraft copied to clipboard
KayVee can hang on shutdown
Once in a blue moon it appears that KayVee can hang on shutdown. The problem has been traced down to a failure the underlying Netty NioWorkerPool to shutdown cleanly (it appears to be waiting for a CountdownLatch to reach 0 - a condition that, for some reason, never happens).
The full stack is at: KayVee 0.1.1 Shutdown Hang Stack
Another source of the hang: deadlock on KayVee 0.1.1 shutdown
This is caused due to a System.exit() call within RaftAlgorithm, which can trigger a deadlock. See: http://docs.oracle.com/javase/7/docs/api/java/lang/Runtime.html#exit(int). The fix is to either:
- Detect that I'm in the middle of a shutdown and not run System.exit()
- Avoid System.exit() entirely
OK. I think I have the exact cause here.
If I call System.exit in a Netty I/O thread the following happens:
- The thread locks the Shutdown.class object
- The Jetty ShutdownThread is invoked, which starts running the shutdown tasks we've registered
- One of the shutdown tasks is RaftAgent.stop(), which waits for all I/O threads to complete
And...deadlock. This is because the netty I/O thread is waiting for all the shutdown tasks to run, but they won't complete because one of the tasks is to actually shut down the I/O thread.