bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

Bookie server runtime.exit() never trigger after registerBookie failed

Open xiang092689 opened this issue 2 years ago • 3 comments

BUG REPORT Bookie server runtime.exit() never trigger after registerBookie failed

try {
    stateManager.registerBookie(true).get();
} catch (Exception e) {
    LOG.error("Couldn't register bookie with zookeeper, shutting down : ", e);
    shutdown(ExitCode.ZK_REG_FAIL);
}

Describe the bug

After zookeeper and bookkeeper shutdown ungracefully bookkeeper startup before zookeeper then bookkeeper register ephemeral znode failed bookie server is expected to shutdown, it did. but jvm is still running, it is expected to exit after bookie server shutdown.

Log:

Couldn't register bookie with zookeeper, shutting down : 
java.util.concurrent.ExecutionException: java.io.IOException: org.apache.bookkeeper.bookie.BookieException$MetadataStoreException: java.io.IOException: ZK exception checking and wait ephemeral znode xxx expired

i'am sorry, i can't show more log for some reason. but it is the key log.

To Reproduce

  1. modify zookeeper config ticktime bigger. such as 200s. easy to reproduce
  2. shutdown bookkeeper and zookeeper ungracefully
  3. make bookkeeper start up before zookeeper
  4. bookkeeper can startup failed for other reason, i set it in k8s, crashed bookkeeper will start up again till the described bug come up

Expected behavior throw exception in blow catch block will trigger exceptionhandler and shutdown hook then fix this problem

try {
    stateManager.registerBookie(true).get();
} catch (Exception e) {
    LOG.error("Couldn't register bookie with zookeeper, shutting down : ", e);
    shutdown(ExitCode.ZK_REG_FAIL);
}

but i wonder why shutdown() don't close future in the main thread it seem that exceptionhandler and shutdownhook can only be trigger by unhandled exception then close future

there should be better way to fix.

xiang092689 avatar Jul 06 '23 07:07 xiang092689

I think the main reason of this failure is the ComponentStarter's future can not complete unless received a exit signal. However the shutdown(ExitCode.ZK_REG_FAIL) don't raise the signal.

@eolivelli Would you mind taking a look as well? Look forward to hear from you.

hezhangjian avatar Jul 07 '23 01:07 hezhangjian

@eolivelli @dlg99 Could you guys take a look? thanks

hezhangjian avatar Jul 28 '23 07:07 hezhangjian