kafka icon indicating copy to clipboard operation
kafka copied to clipboard

KAFKA-16667 Avoid stale read in KRaftMigrationDriver

Open mumrah opened this issue 1 year ago • 3 comments

When becoming the active KRaftMigrationDriver, there is another race condition similar to KAFKA-16171. This time, the race is due to a stale read from ZK. After writing to /controller and /controller_epoch, it is possible that a read on /migration is not linear with the writes that were just made. In other words, we get a stale read on /migration. This leads to an inability to sync metadata to ZK due to incorrect zkVersion on the migration Znode.

The non-linearizability of reads is in fact documented behavior for ZK, so we need to handle it.

To fix the stale read, this patch adds a write to /migration after updating /controller and /controller_epoch. This allows us to learn the correct zkVersion for the migration ZNode before leaving the BECOME_CONTROLLER state.

This patch also adds a check on the current leader epoch when running certain events in KRaftMigrationDriver. Historically, we did not include this check because it is not necessary for correctness. Writes to ZK are gated on the /controller_epoch zkVersion, and RPCs sent to brokers are gated on the controller epoch. However, during a time of rapid failover, there is a lot of processing happening on the controller (i.e., full metadata sync to ZK and full UMRs sent to brokers), so it is best to avoid running events we know will fail.

There is also a small fix in here to improve the logging of ZK operations. The log message are changed to past tense to reflect the fact that they have already happened by the time the log message is created.

mumrah avatar May 10 '24 16:05 mumrah

Thanks for the review @soarez! Sorry it took me so long to get back to this. Been busy migrating clusters to KRaft :)

mumrah avatar Jun 14 '24 16:06 mumrah

I ran ZkMigrationIntegrationTest locally several times and didn't get any failures. Same for ZkMigrationFailoverTest. Let's see how the next Jenkins run goes.

mumrah avatar Jun 17 '24 15:06 mumrah

I ran the migration system tests on this branch and noticed we had an old test for 3.4 that isn't working any longer. Since 3.4 is not supported for migrations, I just removed it.

Otherwise, here's the system test report:

================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.11.4
session_id:       2024-06-25--010
run time:         7 minutes 31.785 seconds
tests run:        4
passed:           4
flaky:            0
failed:           0
ignored:          0
================================================================================
test_id:    kafkatest.tests.core.zookeeper_migration_test.TestMigration.test_online_migration.roll_controller=False
status:     PASS
run time:   2 minutes 44.874 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.zookeeper_migration_test.TestMigration.test_online_migration.roll_controller=True
status:     PASS
run time:   2 minutes 47.894 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.zookeeper_migration_test.TestMigration.test_pre_migration_mode_3_4.metadata_quorum=ISOLATED_KRAFT
status:     PASS
run time:   39.488 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.zookeeper_migration_test.TestMigration.test_reconcile_kraft_to_zk
status:     PASS
run time:   1 minute 19.244 seconds
--------------------------------------------------------------------------------

mumrah avatar Jun 26 '24 14:06 mumrah