pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

Not enough non-faulty bookies available

Open yunfeng79 opened this issue 4 years ago • 15 comments

apache-pulsar-2.8.0

three pulsar cluster

second pulsar messages:

16:33:15.772 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (135, 0): Bookie handle is not available 16:33:15.801 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=135,loopId=413ab672) Exception updating org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available at org.apache.bookkeeper.bookie.LocalBookieEnsemblePlacementPolicy.replaceBookie(LocalBookieEnsemblePlacementPolicy.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:334) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1935) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:134) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:123) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1954) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1903) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:378) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.proto.BookieClientImpl$1.safeRun(BookieClientImpl.java:295) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.1.jar:4.14.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231] 16:33:15.802 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.LedgerHandle - [EnsembleChange(ledger:135, change-id:0000000001)][attempt:1] Exception changing ensemble org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available at org.apache.bookkeeper.bookie.LocalBookieEnsemblePlacementPolicy.replaceBookie(LocalBookieEnsemblePlacementPolicy.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:334) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1935) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:134) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:123) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1954) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1903) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:378) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.proto.BookieClientImpl$1.safeRun(BookieClientImpl.java:295) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.14.1.jar:4.14.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]

yunfeng79 avatar Dec 07 '21 08:12 yunfeng79

@yunfeng79 Not enough non-faulty bk means your alive bk node number is less than broker‘s managedLedgerDefaultEnsembleSize config request https://github.com/apache/pulsar/blob/master/conf/broker.conf#L902. You can take a check on it.

ericsyh avatar Dec 07 '21 13:12 ericsyh

I got the same error, my broker config is managedLedgerDefaultEnsembleSize=2

what should I do to fix this error?

sicklife avatar Dec 12 '21 03:12 sicklife

Pls run the command bin/bookkeeper shell listbookies -rw -h to get the list the ReadWrite Bookies. And check that all broker configs keep consistent managedLedgerDefaultEnsembleSize, managedLedgerDefaultWriteQuorum, managedLedgerDefaultAckQuorum.

Then check that ReadWrite Bookies number >= managedLedgerDefaultEnsembleSize >= managedLedgerDefaultWriteQuorum >= managedLedgerDefaultAckQuorum.

ericsyh avatar Dec 13 '21 13:12 ericsyh

@ericsyh check that ReadWrite Bookies number is same,Still have the same error?

yunfeng79 avatar Dec 16 '21 08:12 yunfeng79

java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_301] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_301] at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:714) ~[?:1.8.0_301] at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:701) ~[?:1.8.0_301] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) [?:1.8.0_301] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) [?:1.8.0_301] at org.apache.pulsar.client.impl.ClientCnx.handleError(ClientCnx.java:691) [org.apache.pulsar-pulsar-client-original-2.8.0.jar:2.8.0] at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:176) [org.apache.pulsar-pulsar-common-2.8.0.jar:2.8.0] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [io.netty-netty-codec-4.1.63.Final.jar:4.1.63.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [io.netty-netty-codec-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) [io.netty-netty-transport-native-epoll-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) [io.netty-netty-transport-native-epoll-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [io.netty-netty-transport-native-epoll-4.1.63.Final.jar:4.1.63.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]

yunfeng79 avatar Dec 16 '21 08:12 yunfeng79

The issue had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Feb 26 '22 01:02 github-actions[bot]

I encountered the same issue when I run bin/bookkeeper shell bookiesanity. I'm using the pulsar-2.10.0.

2022-05-02T12:44:16,501+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.SanityTestCommand - Create ledger 5
2022-05-02T12:44:16,656+0800 [bookkeeper-io-3-1] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x5a822f2d, L:/host:59178]/host:3181, current state CONNECTING :
io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: /host:3181
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
	at io.netty.channel.unix.Errors.newConnectException0(Errors.java:155) ~[io.netty-netty-transport-native-unix-common-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:128) ~[io.netty-netty-transport-native-unix-common-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.channel.unix.Socket.finishConnect(Socket.java:320) ~[io.netty-netty-transport-native-unix-common-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:710) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:687) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2022-05-02T12:44:16,677+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (5, 0): Bookie handle is not available
2022-05-02T12:44:16,691+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=5,loopId=567e98f4) Exception updating
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
	at org.apache.bookkeeper.bookie.LocalBookieEnsemblePlacementPolicy.replaceBookie(LocalBookieEnsemblePlacementPolicy.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:334) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1935) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:134) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:123) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1954) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1903) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:378) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.proto.BookieClientImpl$1.safeRun(BookieClientImpl.java:295) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.4.jar:4.14.4]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2022-05-02T12:44:16,693+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] WARN  org.apache.bookkeeper.client.LedgerHandle - [EnsembleChange(ledger:5, change-id:0000000001)][attempt:1] Exception changing ensemble
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
	at org.apache.bookkeeper.bookie.LocalBookieEnsemblePlacementPolicy.replaceBookie(LocalBookieEnsemblePlacementPolicy.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:334) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1935) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:134) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:123) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1954) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1903) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:378) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.proto.BookieClientImpl$1.safeRun(BookieClientImpl.java:295) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.14.4.jar:4.14.4]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2022-05-02T12:44:16,693+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] ERROR org.apache.bookkeeper.client.LedgerHandle - Closing ledger 5 due to NotEnoughBookiesException: Not enough non-faulty bookies available
2022-05-02T12:44:16,696+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] ERROR org.apache.bookkeeper.client.PendingAddOp - Write of ledger entry to quorum failed: L5 E0
2022-05-02T12:44:16,696+0800 [main] WARN  org.apache.bookkeeper.tools.cli.commands.bookie.SanityTestCommand - Error in bookie sanity test
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
	at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.SyncCallbackUtils$SyncAddCallback.addComplete(SyncCallbackUtils.java:251) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.AsyncCallback$AddCallback.addCompleteWithLatency(AsyncCallback.java:92) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:431) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.LedgerHandle.errorOutPendingAdds(LedgerHandle.java:1799) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.client.LedgerHandle$5.safeRun(LedgerHandle.java:574) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.14.4.jar:4.14.4]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_262]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_262]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_262]
2022-05-02T12:44:16,704+0800 [main-EventThread] WARN  org.apache.bookkeeper.meta.AbstractZkLedgerManager - Ledger node does not exist in ZooKeeper: ledgerId=5
2022-05-02T12:44:16,704+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=5,loopId=48132b31) Error writing metadata to store
org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsOnMetadataServerException: No such ledger exists on Metadata Server
	at org.apache.bookkeeper.meta.AbstractZkLedgerManager$4.processResult(AbstractZkLedgerManager.java:505) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$22$1.processResult(ZooKeeperClient.java:1094) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:638) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
2022-05-02T12:44:16,704+0800 [main-EventThread] WARN  org.apache.bookkeeper.client.LedgerHandle - Close failed: NoSuchLedgerExistsOnMetadataServerException: No such ledger exists on Metadata Server
2022-05-02T12:44:16,709+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.SanityTestCommand - Deleted ledger 5
2022-05-02T12:44:16,710+0800 [main] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Closing the per channel bookie client for host:3181
2022-05-02T12:44:16,711+0800 [main] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Closing the per channel bookie client for host:3181
2022-05-02T12:44:16,836+0800 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x30c3eef1b130009 closed
2022-05-02T12:44:16,836+0800 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x30c3eef1b130009

Hongten avatar May 02 '22 04:05 Hongten

The issue had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Jun 03 '22 02:06 github-actions[bot]

any update ? still stuck with this~

sicklife avatar Jun 13 '22 15:06 sicklife

Same problem here. We are unable to restart our cluster. Brokers keep on crashing all the time due to this issue.

Any workaround?

fmiguelez avatar Jun 17 '22 08:06 fmiguelez

Pls run the command bin/bookkeeper shell listbookies -rw -h to get the list the ReadWrite Bookies. And check that all broker configs keep consistent managedLedgerDefaultEnsembleSize, managedLedgerDefaultWriteQuorum, managedLedgerDefaultAckQuorum.

Then check that ReadWrite Bookies number >= managedLedgerDefaultEnsembleSize >= managedLedgerDefaultWriteQuorum >= managedLedgerDefaultAckQuorum.

When I issue that command from our toolset pod I get the following:

08:54:48.798 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - ReadWrite Bookies :
08:54:48.812 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:rev-pulsar-bookie-2.rev-pulsar-bookie.mdp-dgt-review-check-data-ocdup2.svc.cluster.local:3181, IP:10.72.168.230, Port:3181, Hostname:rev-pulsar-bookie-2.rev-pulsar-bookie.mdp-dgt-review-check-data-ocdup2.svc.cluster.local
08:54:48.816 [main-SendThread(rev-pulsar-zookeeper:2181)] WARN  org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send thread for session 0x10053b99a7100f5.
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x10053b99a7100f5, likely server has closed socket
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
08:54:48.933 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x10053b99a7100f5 closed
08:54:48.933 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x10053b99a7100f5

@ericsyh what do you suggest?

fmiguelez avatar Jun 17 '22 08:06 fmiguelez

@fmiguelez Your bookie replica should be more than 3 but the returned readwrite bookie is only rev-pulsar-bookie-2.

Is the bookie pod rev-pulsar-bookie-0 and rev-pulsar-bookie-1 still running? You can check their disk free status by https://bookkeeper.apache.org/docs/admin/http#endpoint-apiv1bookieinfo

In the toolset pod, you can try curl rev-pulsar-bookie-0.rev-pulsar-bookie.mdp-dgt-review-check-data-ocdup2.svc.cluster.local:8000/api/v1/bookie/info

ericsyh avatar Jun 17 '22 12:06 ericsyh

@fmiguelez Your bookie replica should be more than 3 but the returned readwrite bookie is only rev-pulsar-bookie-2.

Is the bookie pod rev-pulsar-bookie-0 and rev-pulsar-bookie-1 still running? You can check their disk free status by https://bookkeeper.apache.org/docs/admin/http#endpoint-apiv1bookieinfo

In the toolset pod, you can try curl rev-pulsar-bookie-0.rev-pulsar-bookie.mdp-dgt-review-check-data-ocdup2.svc.cluster.local:8000/api/v1/bookie/info

It was actually a matter of disk space. Bookie logs reported disk usage beyond threshold, so they were set in read-only mode.

	fer@N315D154:~/ws/mdp/mdp-demo$ kubectl -n mdp-dgt-review-check-data-ocdup2 logs --tail 2 rev-pulsar-bookie-0
	09:05:58.784 [LedgerDirsMonitorThread] ERROR org.apache.bookkeeper.util.DiskChecker - Space left on device /pulsar/data/bookkeeper/ledgers/current : 2447400960, Used space fraction: 0.953402 > threshold 0.95.
	09:05:58.826 [SyncThread-7-1] INFO  org.apache.bookkeeper.bookie.LedgerDirsManager - No writable ledger dirs below diskUsageThreshold. But Dirs that can accommodate 1073741824 are: [/pulsar/data/bookkeeper/ledgers/current]
	fer@N315D154:~/ws/mdp/mdp-demo$ kubectl -n mdp-dgt-review-check-data-ocdup2 logs --tail 2 rev-pulsar-bookie-1
	09:05:57.685 [LedgerDirsMonitorThread] ERROR org.apache.bookkeeper.util.DiskChecker - Space left on device /pulsar/data/bookkeeper/ledgers/current : 2414260224, Used space fraction: 0.95403296 > threshold 0.95.
	09:05:57.734 [SyncThread-7-1] INFO  org.apache.bookkeeper.bookie.LedgerDirsManager - No writable ledger dirs below diskUsageThreshold. But Dirs that can accommodate 1073741824 are: [/pulsar/data/bookkeeper/ledgers/current]
	fer@N315D154:~/ws/mdp/mdp-demo$ kubectl -n mdp-dgt-review-check-data-ocdup2 logs --tail 2 rev-pulsar-bookie-2
	08:44:00.631 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x2b678089, L:/10.72.168.230:3181 ! R:/10.72.168.138:39958]
	08:56:59.187 [GarbageCollectorThread-11-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Disk almost full, suspend major compaction to slow down filling disk.
	fer@N315D154:~/ws/mdp/mdp-demo$ kubectl -n mdp-dgt-review-check-data-ocdup2 logs --tail 2 rev-pulsar-bookie-3
	09:07:18.867 [LedgerDirsMonitorThread] WARN  org.apache.bookkeeper.bookie.LedgerDirsMonitor - LedgerDirsMonitor check process: All ledger directories are non writable
	09:07:18.868 [LedgerDirsMonitorThread] ERROR org.apache.bookkeeper.util.DiskChecker - Space left on device /pulsar/data/bookkeeper/ledgers/current : 2292150272, Used space fraction: 0.9563579 > threshold 0.95.  

I modified the bookie configmap to set a threshold of 0.98.

fer@N315D154:~$ kubectl -n mdp-dgt-review-check-data-ocdup2 get configmaps rev-pulsar-bookie -o yaml > rev-pulsar-bookie-cm.yaml Edit pulsar-bookie-cm.yml and add following property: diskUsageThreshold: "0.98" And then applied changes: fer@N315D154:~$ kubectl -n mdp-dgt-review-check-data-ocdup2 apply -f rev-pulsar-bookie-cm.yaml

For the bookie pods to get the change I deleted manually one by one the bookie replica pods. After that broker was able to start.

However this situation lasted not long as eventually two of the bookies started to occupy all the space.

I analyzed all the topics stats and actually some of them returned error:

	root@rev-pulsar-toolset-0:/pulsar/bin# for i in $(./pulsar-admin namespaces list dbus); do topics=$(./pulsar-admin topics list $i); for topic in $topics; do echo $topic; ./pulsar-admin topics stats $topic | grep storageSize; done; done
	"persistent://dbus/dgt-measure-point-15min-local/traffic-data-events"
	  "storageSize" : 43968,
	"persistent://dbus/dgt-measure-point-15min-local/status"
	HTTP 500 Server Error

	Reason: HTTP 500 Server Error
	"persistent://dbus/dgt-time-series-point/traffic-data-events"
	  "storageSize" : 2732639642,
	"persistent://dbus/dgt-time-series-point/status"
	HTTP 500 Server Error

	Reason: HTTP 500 Server Error
	"persistent://dbus/test-ltp-config/variables"
	  "storageSize" : 0,

Every bookie replica pod has a volume of 50 GiB for ledgers and 10 GiB for journals. The volumen being filled up was the one of the ledgers. As you can see only one topic has a considerable size (2.5 GiB), which does not make any sense.

There were no producers producing any more data at the moment of the error.

My suspicion is that the compaction process (enabled for all topics and set to 10 MB threshold) is the one filling up disk (probably with temporary data copy). It can be a problem as biggest topic does not have any repeating key (compaction does not make any sense in this case). I have observed logs like the following

11:19:53.242 [GarbageCollectorThread-11-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Disk almost full, suspend major compaction to slow down filling disk.
11:19:53.242 [GarbageCollectorThread-11-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Enter minor compaction, suspendMinor false
11:19:53.242 [GarbageCollectorThread-11-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Do compaction to compact those files lower than 0.2

I have removed my namespace and redeployed two versions of the services:

  • One without compaction enabled.
  • Another one identical to the one reporting the issues (clean).

Will check on Monday differences between both environments.

fmiguelez avatar Jun 17 '22 13:06 fmiguelez

I encountered the same issue when I run bin/bookkeeper shell bookiesanity. I'm using the pulsar-2.10.0.

I had the same problem

cellcb avatar Jul 18 '22 08:07 cellcb

@fmiguelez

I'm also trying to find the solution.

Could you please update anything of your finding?

What I have done is configuring these two setting in bookie's configurations.

  • minorCompactionThreshold : -1
  • majorCompactionThreshold : -1

image image

moweonlee avatar Sep 21 '22 14:09 moweonlee

@fmiguelez

In my case, What frankjkelly said works for me.

I think what he had done is to minimize Compaction triggering interval for more aggressive GC.

https://github.com/apache/pulsar/issues/6639

moweonlee avatar Sep 22 '22 01:09 moweonlee

@moweonlee I have reviewed my notes of that day. The result was the same, if compaction was enabled at namespace level, bookies unreadable due to disk full.

In our systems we usually have compaction enabled globally but allow disabling it at topic level. We were surprised to check that such feature does not work anymore (I would swear to have seen it working in previous versions). We reported the issue in #15891, as such error seems to be related when trying to configure compaction at topic level.

The only solution we have found is to disable compaction at namespace level when creating it. In that case bookies disk usage does not increase uncontrollably as with compaction.

I would say that compaction is not a feature that works flawlessly in Pulsar, though on paper it sounds great to be able to work with both compacted and non-compacted topics at the same time.

fmiguelez avatar Sep 22 '22 07:09 fmiguelez

Yes, you're right @fmiguelez.

Same issue was reproduced after couple of days and I solved my issue just by reducing retention and TTL finally.

But I'd rather swear that the issue would be reproduced with longer interval than before.

Thank you for sharing your information about compaction.

moweonlee avatar Oct 06 '22 10:10 moweonlee

@moweonlee , could you please let me know what exact config/value did you update for retention and TTL to get this issue resolved.

arpillai89 avatar Nov 16 '22 23:11 arpillai89

@arpillai89

Here is the configuration that I used for my pulsar cluster.

I didn't think about how much storage volume is necessary to store all data incoming.

Now it is clear that my case was pretty simple.

So it will be good to see Grafana's storage volume chart first as long as Pulsar is deployed in K8s

./bin/pulsar-admin namespaces set-retention public/default --size 8G --time -1
./bin/pulsar-admin namespaces get-retention public/default 

./bin/pulsar-admin namespaces set-message-ttl public/default --messageTTL 60
./bin/pulsar-admin namespaces get-message-ttl public/defaul

moweonlee avatar Nov 24 '22 21:11 moweonlee

Pls run the command bin/bookkeeper shell listbookies -rw -h to get the list the ReadWrite Bookies. And check that all broker configs keep consistent managedLedgerDefaultEnsembleSize, managedLedgerDefaultWriteQuorum, managedLedgerDefaultAckQuorum.

Then check that ReadWrite Bookies number >= managedLedgerDefaultEnsembleSize >= managedLedgerDefaultWriteQuorum >= managedLedgerDefaultAckQuorum.

I deploy my pulsar demo with 1 zk, 1 bookie and 1 broker. When I create the topic the same error occurs(Not enough non-faulty bookies available). Then I checked the 3 options that all the default value is 2. When I change all the values to 1, and then restart the broker, the problem solved. Thanks.

hellboy0621 avatar Apr 15 '23 11:04 hellboy0621

BROKER: webSocketServiceEnabled: "true" # Not enough non-faulty bookies available

eaba avatar Oct 07 '23 10:10 eaba

@eaba What does your comment mean?

Enable this? Disable this?

ton3r avatar Dec 21 '23 13:12 ton3r

I have some problem with pulsar:latest docker image

samet-karakose avatar Dec 27 '23 00:12 samet-karakose