KAFKA-10251: wait for consumer rebalance completed before consuming records
The flaky tests error:
org.opentest4j.AssertionFailedError: Consumed 0 records before timeout instead of the expected 200 records
at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:39)
at org.junit.jupiter.api.Assertions.fail(Assertions.java:117)
at kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:833)
at kafka.api.TransactionsBounceTest.testWithGroupId(TransactionsBounceTest.scala:112)
This test is unstable because we bouncing broker during the test to verify the transnational data is still as expected. When the broker down and on, it will let the consumer group rebalance take longer. To fix it, we can make the rebalance happened earlier before broker bouncing, so that we can make sure when test starts, we don't worry about rebalance and polling records directly.
Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
@apurvam @hachikuji , could you review this PR? Thanks.
@apurvam @hachikuji , could you review this PR? Thanks.
@apurvam @hachikuji , call for review. Thank you.
@apurvam @hachikuji , please help review. Thank you.
All failed tests are unrelated. Thanks.
Build / JDK 15 and Scala 2.13 / kafka.server.RaftClusterTest.testCreateClusterAndCreateAndManyTopics()
Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationSSLTest.testReplication()
Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationSSLTest.testOneWayReplicationWithAutoOffsetSync()
Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationTest.testReplication()
Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationTest.testReplicationWithEmptyPartition()
Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationTest.testOneWayReplicationWithAutoOffsetSync()
Build / JDK 11 and Scala 2.13 / org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable
Build / JDK 11 and Scala 2.13 / org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable
@mumrah maybe you can help review this.
@mumrah , thanks for the comments. This flaky test only can be reproduced in a very slow machine, like jenkins. I investigate it by adding some debug log and run in jenkins, and check the logs when failed. After my fix, I run it 100 times on jenkins and no failed happened. I believe it's a reasonable fix. Thanks.
failed tests are unrelated and flaky (and no TransactionsBounceTest failed):
Build / JDK 15 and Scala 2.13 / kafka.network.SocketServerTest.testClientDisconnectionUpdatesRequestMetrics()
Build / JDK 8 and Scala 2.12 / kafka.api.TransactionsTest.testAbortTransactionTimeout()
Build / JDK 8 and Scala 2.12 / kafka.network.SocketServerTest.testUnmuteChannelWithBufferedReceives()
Build / JDK 11 and Scala 2.13 / kafka.server.MultipleListenersWithAdditionalJaasContextTest.testProduceConsume()
Thanks.
@mumrah , could you please take a look again? Thank you.
@mumrah , could you help have a 2nd review? Thanks.
@mumrah ,call for 2nd review? Thanks.
@mumrah , it failed frequently these days. I think we should merge this fix soon. Please help have a 2nd review when available. Thank you.