kaap icon indicating copy to clipboard operation
kaap copied to clipboard

Failed to downgrade from version 3.0.0 to 2.10.x

Open pgier opened this issue 2 years ago • 0 comments

I tried to downgrade a cluster from 3.0.0 to 2.10.5 and the process became stuck due to bookkeeper crashing and never reaching a ready state.

Errors in bookkeeper logs immediately before each crash:

2023-07-28T18:12:07,404+0000 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: instanceId null is not matching with 656d0f97-6d6e-40fa-b319-c008893cbf58
        at org.apache.bookkeeper.bookie.Cookie.verifyInternal(Cookie.java:168) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
        at org.apache.bookkeeper.bookie.Cookie.verify(Cookie.java:173) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
        at org.apache.bookkeeper.bookie.LegacyCookieValidation.verifyAndGetMissingDirs(LegacyCookieValidation.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
        at org.apache.bookkeeper.bookie.LegacyCookieValidation.checkCookies(LegacyCookieValidation.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
        at org.apache.bookkeeper.server.EmbeddedServer$Builder.build(EmbeddedServer.java:408) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
        at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:277) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
        at org.apache.bookkeeper.server.Main.doMain(Main.java:216) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
        at org.apache.bookkeeper.server.Main.main(Main.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]

Errors in operator logs:

18:10:54 INFO  [com.dat.oss.kaa.con.PulsarClusterController] (ReconcilerExecutor-pulsar-cluster-app-95) waiting for bookkeeper to become ready
18:10:55 INFO  [com.dat.oss.kaa.con.boo.BookKeeperController] (ReconcilerExecutor-pulsar-bk-controller-94) Initializing bookie racks for bookkeeper-set 'bookkeeper'
18:10:55 ERROR [com.dat.oss.kaa.con.AbstractController] (ReconcilerExecutor-pulsar-bk-controller-94) Error during reconciliation for resource bookkeepers.kaap.oss.datastax.com with name pulsar-bookkeeper: KeeperErrorCode = NoNode for /bookies: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /bookies
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2028)
        at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327)
        at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93)
        at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313)
        at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304)
        at org.apache.curator.framework.imps.GetDataBuilderImpl$2.forPath(GetDataBuilderImpl.java:145)
        at org.apache.curator.framework.imps.GetDataBuilderImpl$2.forPath(GetDataBuilderImpl.java:141)
        at com.datastax.oss.kaap.controllers.bookkeeper.racks.client.ZkClientRackClient$ZkNodeOp.get(ZkClientRackClient.java:140)
        at com.datastax.oss.kaap.controllers.bookkeeper.racks.BookKeeperRackMonitor.internalRun(BookKeeperRackMonitor.java:73)
        at com.datastax.oss.kaap.controllers.bookkeeper.racks.BookKeeperRackDaemon.triggerSync(BookKeeperRackDaemon.java:58)
        at com.datastax.oss.kaap.controllers.bookkeeper.BookKeeperController.compareLastAppliedSetSpec(BookKeeperController.java:249)
        at com.datastax.oss.kaap.controllers.bookkeeper.BookKeeperController.compareLastAppliedSetSpec(BookKeeperController.java:52)
        at com.datastax.oss.kaap.controllers.AbstractResourceSetsController.patchResources(AbstractResourceSetsController.java:125)
        at com.datastax.oss.kaap.controllers.AbstractController.reconcile(AbstractController.java:139)
        at com.datastax.oss.kaap.controllers.AbstractController.reconcile(AbstractController.java:62)
        at com.datastax.oss.kaap.controllers.bookkeeper.BookKeeperController_ClientProxy.reconcile(Unknown Source)
        at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:145)
        at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:103)
        at io.javaoperatorsdk.operator.monitoring.micrometer.MicrometerMetrics.lambda$timeControllerExecution$0(MicrometerMetrics.java:86)
        at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69)
        at io.javaoperatorsdk.operator.monitoring.micrometer.MicrometerMetrics.timeControllerExecution(MicrometerMetrics.java:84)
        at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:102)
        at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:141)
        at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:121)
        at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
        at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
        at io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:415)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

pgier avatar Jul 28 '23 18:07 pgier