Pinot Server Graceful Shutdown Improvements
The current graceful shutdown steps are listed below in order (along with the issues):
PinotFSFactoryis shutdown first. Issue: This will fail segment upload for the segments that are committing when a server shutdown is happening.- HelixManager disconnects which shuts down the ZkClient. Issue: This happens before segment data managers are destroyed, so if there is a segment that tries to do a commit after the disconnect is called, it will throw because the ZkClient is already shutdown.
- Server instance shutdown issues destroy for all
SegmentDataManager. For realtime tables, this attempts to stop the ingestion by joining with the consumer thread for each consuming segment. (no issue with this)
All this means that we could run into scenarios where a segment goes into error-state, because the deep-store link for the segment is missing and peer download wouldn't work because server restarts usually take at least 2 minutes or more and current retry logic only waits 10-15s for an Online peer (by default the replica will take 31 seconds to catch-up to the final offset. if it can't, then we try to download the segment instead).
If instead of a server restart it was a host failure, then we could also have data loss.
I am working offline with some stakeholders for the fixes.
Exception seen due to closed ZkClient:
2023-06-09 03:00:29.552 [some_table__242__1314__20230609T0216Z] ERROR o.a.p.c.d.m.r.LLRealtimeSegmentDataManager_some_table__242__1314__20230609T0216Z - Exception while in work
java.lang.IllegalStateException: ZkClient already closed!
at org.apache.helix.zookeeper.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1977)
at org.apache.helix.zookeeper.zkclient.ZkClient.readData(ZkClient.java:2139)
at org.apache.helix.zookeeper.zkclient.ZkClient.readData(ZkClient.java:2131)
at org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:495)
at org.apache.helix.manager.zk.ZkCacheBaseDataAccessor.get(ZkCacheBaseDataAccessor.java:397)
at org.apache.helix.store.zk.AutoFallbackPropertyStore.get(AutoFallbackPropertyStore.java:101)
at org.apache.pinot.common.metadata.ZKMetadataProvider.getTableConfig(ZKMetadataProvider.java:308)
at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.replaceLLSegment(RealtimeTableDataManager.java:665)
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.commitSegment(LLRealtimeSegmentDataManager.java:1012)
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:728)
at java.base/java.lang.Thread.run(Thread.java:829)
cc: @Jackie-Jiang