kaap
kaap copied to clipboard
Failed to downgrade from version 3.0.0 to 2.10.x
I tried to downgrade a cluster from 3.0.0 to 2.10.5 and the process became stuck due to bookkeeper crashing and never reaching a ready state.
Errors in bookkeeper logs immediately before each crash:
2023-07-28T18:12:07,404+0000 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: instanceId null is not matching with 656d0f97-6d6e-40fa-b319-c008893cbf58
at org.apache.bookkeeper.bookie.Cookie.verifyInternal(Cookie.java:168) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.bookie.Cookie.verify(Cookie.java:173) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.bookie.LegacyCookieValidation.verifyAndGetMissingDirs(LegacyCookieValidation.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.bookie.LegacyCookieValidation.checkCookies(LegacyCookieValidation.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.server.EmbeddedServer$Builder.build(EmbeddedServer.java:408) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:277) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.server.Main.doMain(Main.java:216) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.server.Main.main(Main.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
Errors in operator logs:
18:10:54 INFO [com.dat.oss.kaa.con.PulsarClusterController] (ReconcilerExecutor-pulsar-cluster-app-95) waiting for bookkeeper to become ready
18:10:55 INFO [com.dat.oss.kaa.con.boo.BookKeeperController] (ReconcilerExecutor-pulsar-bk-controller-94) Initializing bookie racks for bookkeeper-set 'bookkeeper'
18:10:55 ERROR [com.dat.oss.kaa.con.AbstractController] (ReconcilerExecutor-pulsar-bk-controller-94) Error during reconciliation for resource bookkeepers.kaap.oss.datastax.com with name pulsar-bookkeeper: KeeperErrorCode = NoNode for /bookies: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /bookies
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2028)
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327)
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93)
at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313)
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304)
at org.apache.curator.framework.imps.GetDataBuilderImpl$2.forPath(GetDataBuilderImpl.java:145)
at org.apache.curator.framework.imps.GetDataBuilderImpl$2.forPath(GetDataBuilderImpl.java:141)
at com.datastax.oss.kaap.controllers.bookkeeper.racks.client.ZkClientRackClient$ZkNodeOp.get(ZkClientRackClient.java:140)
at com.datastax.oss.kaap.controllers.bookkeeper.racks.BookKeeperRackMonitor.internalRun(BookKeeperRackMonitor.java:73)
at com.datastax.oss.kaap.controllers.bookkeeper.racks.BookKeeperRackDaemon.triggerSync(BookKeeperRackDaemon.java:58)
at com.datastax.oss.kaap.controllers.bookkeeper.BookKeeperController.compareLastAppliedSetSpec(BookKeeperController.java:249)
at com.datastax.oss.kaap.controllers.bookkeeper.BookKeeperController.compareLastAppliedSetSpec(BookKeeperController.java:52)
at com.datastax.oss.kaap.controllers.AbstractResourceSetsController.patchResources(AbstractResourceSetsController.java:125)
at com.datastax.oss.kaap.controllers.AbstractController.reconcile(AbstractController.java:139)
at com.datastax.oss.kaap.controllers.AbstractController.reconcile(AbstractController.java:62)
at com.datastax.oss.kaap.controllers.bookkeeper.BookKeeperController_ClientProxy.reconcile(Unknown Source)
at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:145)
at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:103)
at io.javaoperatorsdk.operator.monitoring.micrometer.MicrometerMetrics.lambda$timeControllerExecution$0(MicrometerMetrics.java:86)
at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69)
at io.javaoperatorsdk.operator.monitoring.micrometer.MicrometerMetrics.timeControllerExecution(MicrometerMetrics.java:84)
at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:102)
at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:141)
at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:121)
at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
at io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:415)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)