pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

Flaky-test: BadVersionExceptions in OneWayReplicatorUsingGlobalZKTest.cleanup

Open lhotari opened this issue 1 year ago • 0 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Example failure

https://github.com/apache/pulsar/actions/runs/11368523712/job/31653376302?pr=23468#step:11:2008

Logs: https://gist.github.com/lhotari/02be1e0d55026ca51730e6d932dfe1bc

Additional context

This seems to block all Pulsar CI PR build jobs from completing successfully at the moment (Thu Oct 17 09:26:11 UTC 2024).

Exception stacktrace

  Error:  Failures: 
  Error:    OneWayReplicatorUsingGlobalZKTest.cleanup » TestNGRuntime org.apache.pulsar.client.admin.PulsarAdminException$ServerSideErrorException: 
   --- An unexpected error occurred in the server ---
  
  Message: org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /managed-ledgers/public/default/persistent/tp_-70132750-44af-4d14-817c-219034d2b7be-partition-0/pulsar.repl.r2
  
  Stacktrace:
  
  org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /managed-ledgers/public/default/persistent/tp_-70132750-44af-4d14-817c-219034d2b7be-partition-0/pulsar.repl.r2
  	at org.apache.pulsar.broker.service.persistent.PersistentTopic$6.deleteLedgerFailed(PersistentTopic.java:1546)
  	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.lambda$asyncDelete$35(ManagedLedgerImpl.java:2978)
  	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
  	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
  	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
  	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
  	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.lambda$asyncTruncate$58(ManagedLedgerImpl.java:4372)
  	at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990)
  	at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:974)
  	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
  	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
  	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl$26.clearBacklogFailed(ManagedLedgerImpl.java:4363)
  	at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$16.markDeleteFailed(ManagedCursorImpl.java:1767)
  	at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$28.operationFailed(ManagedCursorImpl.java:2940)
  	at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$31.lambda$operationFailed$0(ManagedCursorImpl.java:3317)
  	at java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:787)
  	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
  	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
  	at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.lambda$deleteLedgerAsync$39(ManagedCursorImpl.java:3051)
  	at org.apache.bookkeeper.client.LedgerDeleteOp.lambda$initiate$0(LedgerDeleteOp.java:86)
  	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
  	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
  	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
  	at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
  	at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
  	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  	at java.base/java.lang.Thread.run(Thread.java:1583)
  Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /managed-ledgers/public/default/persistent/tp_-70132750-44af-4d14-817c-219034d2b7be-partition-0/pulsar.repl.r2
2024-10-17T04:15:33,865 - INFO  - [broker-topic-workers-OrderedExecutor-2-0:AbstractMetadataStore] - Deleting path: /ledgers/00/0000/L0032 (v. Optional.empty)
2024-10-17T04:15:33,865 - WARN  - [bookkeeper-ml-scheduler-OrderedScheduler-3-0:ManagedLedgerImpl] - [public/ns_73b1a31afce34671a5ddc48fe5ad7fc8/persistent/___tp-5dd50794-7af8-4a34-8a0b-06188052c66a] Failed to delete managed ledger
org.apache.bookkeeper.mledger.ManagedLedgerException$MetaStoreException: java.util.concurrent.CompletionException: org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /managed-ledgers/public/ns_73b1a31afce34671a5ddc48fe5ad7fc8/persistent/___tp-5dd50794-7af8-4a34-8a0b-06188052c66a
Caused by: java.util.concurrent.CompletionException: org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /managed-ledgers/public/ns_73b1a31afce34671a5ddc48fe5ad7fc8/persistent/___tp-5dd50794-7af8-4a34-8a0b-06188052c66a
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:781) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194) ~[?:?]
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$internalStoreDelete$13(ZKMetadataStore.java:391) ~[pulsar-metadata-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.113.Final.jar:4.1.113.Final]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /managed-ledgers/public/ns_73b1a31afce34671a5ddc48fe5ad7fc8/persistent/___tp-5dd50794-7af8-4a34-8a0b-06188052c66a
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.getException(ZKMetadataStore.java:486) ~[pulsar-metadata-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$internalStoreDelete$13(ZKMetadataStore.java:391) ~[pulsar-metadata-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) [?:?]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) [?:?]
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.113.Final.jar:4.1.113.Final]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /managed-ledgers/public/ns_73b1a31afce34671a5ddc48fe5ad7fc8/persistent/___tp-5dd50794-7af8-4a34-8a0b-06188052c66a
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:117) ~[zookeeper-3.9.2.jar:3.9.2]
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:53) ~[zookeeper-3.9.2.jar:3.9.2]
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.getException(ZKMetadataStore.java:480) ~[pulsar-metadata-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$internalStoreDelete$13(ZKMetadataStore.java:391) ~[pulsar-metadata-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.113.Final.jar:4.1.113.Final]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[?:?]

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

lhotari avatar Oct 17 '24 07:10 lhotari