kafka icon indicating copy to clipboard operation
kafka copied to clipboard

KAFKA-10251: wait for consumer rebalance completed before consuming records

Open showuon opened this issue 4 years ago • 12 comments

The flaky tests error:

org.opentest4j.AssertionFailedError: Consumed 0 records before timeout instead of the expected 200 records
	at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:39)
	at org.junit.jupiter.api.Assertions.fail(Assertions.java:117)
	at kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:833)
	at kafka.api.TransactionsBounceTest.testWithGroupId(TransactionsBounceTest.scala:112)

This test is unstable because we bouncing broker during the test to verify the transnational data is still as expected. When the broker down and on, it will let the consumer group rebalance take longer. To fix it, we can make the rebalance happened earlier before broker bouncing, so that we can make sure when test starts, we don't worry about rebalance and polling records directly.

Committer Checklist (excluded from commit message)

  • [ ] Verify design and implementation
  • [ ] Verify test coverage and CI build status
  • [ ] Verify documentation (including upgrade notes)

showuon avatar Mar 17 '21 05:03 showuon

@apurvam @hachikuji , could you review this PR? Thanks.

showuon avatar Mar 17 '21 07:03 showuon

@apurvam @hachikuji , could you review this PR? Thanks.

showuon avatar Mar 23 '21 03:03 showuon

@apurvam @hachikuji , call for review. Thank you.

showuon avatar Apr 06 '21 06:04 showuon

@apurvam @hachikuji , please help review. Thank you.

showuon avatar Apr 13 '21 13:04 showuon

All failed tests are unrelated. Thanks.


    Build / JDK 15 and Scala 2.13 / kafka.server.RaftClusterTest.testCreateClusterAndCreateAndManyTopics()
    Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationSSLTest.testReplication()
    Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationSSLTest.testOneWayReplicationWithAutoOffsetSync()
    Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationTest.testReplication()
    Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationTest.testReplicationWithEmptyPartition()
    Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationTest.testOneWayReplicationWithAutoOffsetSync()
    Build / JDK 11 and Scala 2.13 / org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable
    Build / JDK 11 and Scala 2.13 / org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable

showuon avatar Apr 16 '21 13:04 showuon

@mumrah maybe you can help review this.

ijuma avatar May 19 '21 12:05 ijuma

@mumrah , thanks for the comments. This flaky test only can be reproduced in a very slow machine, like jenkins. I investigate it by adding some debug log and run in jenkins, and check the logs when failed. After my fix, I run it 100 times on jenkins and no failed happened. I believe it's a reasonable fix. Thanks.

showuon avatar May 20 '21 02:05 showuon

failed tests are unrelated and flaky (and no TransactionsBounceTest failed):

    Build / JDK 15 and Scala 2.13 / kafka.network.SocketServerTest.testClientDisconnectionUpdatesRequestMetrics()
    Build / JDK 8 and Scala 2.12 / kafka.api.TransactionsTest.testAbortTransactionTimeout()
    Build / JDK 8 and Scala 2.12 / kafka.network.SocketServerTest.testUnmuteChannelWithBufferedReceives()
    Build / JDK 11 and Scala 2.13 / kafka.server.MultipleListenersWithAdditionalJaasContextTest.testProduceConsume()

Thanks.

showuon avatar May 20 '21 07:05 showuon

@mumrah , could you please take a look again? Thank you.

showuon avatar May 26 '21 13:05 showuon

@mumrah , could you help have a 2nd review? Thanks.

showuon avatar Jun 08 '21 02:06 showuon

@mumrah ,call for 2nd review? Thanks.

showuon avatar Jun 18 '21 06:06 showuon

@mumrah , it failed frequently these days. I think we should merge this fix soon. Please help have a 2nd review when available. Thank you.

showuon avatar Jul 08 '21 12:07 showuon