kafka icon indicating copy to clipboard operation
kafka copied to clipboard

KAFKA-16813: Add global timeout for "@Test" and "@TestTemplate" (For Github Action)

Open m1a2st opened this issue 1 year ago • 6 comments

Jira: https://issues.apache.org/jira/browse/KAFKA-16813

add ClusterTest and ClusterTests timout for 30 seconds.

This is testing for GitHub Action CI.

Committer Checklist (excluded from commit message)

  • [ ] Verify design and implementation
  • [ ] Verify test coverage and CI build status
  • [ ] Verify documentation (including upgrade notes)

m1a2st avatar Aug 22 '24 03:08 m1a2st

@m1a2st please rebase code and fix conflicts

chia7712 avatar Aug 22 '24 08:08 chia7712

@m1a2st can you pull in the latest trunk changes? I have added code to report the JUnit failures in the Github Actions workflows. I want to do a few runs with this change to ensure we haven't introduced a bunch of timeout failures

mumrah avatar Aug 23 '24 15:08 mumrah

@m1a2st could you please add timeout to @ClusterTemplate?

chia7712 avatar Aug 23 '24 16:08 chia7712

@m1a2st can you pull in the latest trunk changes? I have added code to report the JUnit failures in the Github Actions workflows. I want to do a few runs with this change to ensure we haven't introduced a bunch of timeout failures

I rebased it.

@m1a2st could you please add timeout to @ClusterTemplate?

I think I should add a constant in TestUtils for 30 second, It will be more easy to controll this parameter. WDYT?

m1a2st avatar Aug 24 '24 00:08 m1a2st

I think I should add a constant in TestUtils for 30 second, It will be more easy to controll this parameter. WDYT?

there are only 3 annotations needing the @Timeout(30), so it should be fine to add the timeout annotation individually.

chia7712 avatar Aug 24 '24 01:08 chia7712

| ❌ | LagFetchIntegrationTest | shouldFetchLagsDuringRebalancingWithNoOptimization() | java.lang.AssertionError: 
Expected: <0L>
     but: was <5L> | 63.29s |

I have filed https://issues.apache.org/jira/browse/KAFKA-17418 to fix the markdown

chia7712 avatar Aug 25 '24 12:08 chia7712

@m1a2st Could you please list the timeout tests? Maybe we can file PR to speedup them

chia7712 avatar Sep 02 '24 16:09 chia7712

I will check these tests which timeout 30 second

  • ResetConsumerGroupOffsetTest.testResetOffsetsToEarliestOnTopicsAndPartitions
  • ResetConsumerGroupOffsetTest.testResetOffsetsAllTopicsAllGroups
  • DescribeConsumerGroupTest.testDescribeWithMultiPartitionTopicAndMultipleConsumers
  • DescribeConsumerGroupTest.testDescribeWithConsumersWithoutAssignedPartitions
  • ZkMigrationIntegrationTest.testMigrateTopicDeletions
  • ClientTelemetryTest.testClientInstanceId

This test is timeout for 120s

  • KafkaAdminClientTest.testClientSideTimeoutAfterFailureToReceiveResponse

This is flaky

m1a2st avatar Sep 03 '24 15:09 m1a2st

testClientSideTimeoutAfterFailureToReceiveResponse is known flaky (https://issues.apache.org/jira/browse/KAFKA-15916)

ZkMigrationIntegrationTest is a bit complicated, so it should have different timeout. The other IT needs to be improved to reduce the elapsed time. Could you please file sub-tasks (jira) for each test class?

chia7712 avatar Sep 03 '24 15:09 chia7712

@chia7712 I have a separate PR for ZkMigrationIntegrationTest https://github.com/apache/kafka/pull/17004. This test definitely needs to set its own timeouts as it is rather complicated. In retrospect, some parts of this tests probably should have been system tests.

mumrah avatar Sep 03 '24 15:09 mumrah

I open these jira tasks to trace these timeout tests https://issues.apache.org/jira/browse/KAFKA-17471 https://issues.apache.org/jira/browse/KAFKA-17472 https://issues.apache.org/jira/browse/KAFKA-17473

m1a2st avatar Sep 03 '24 15:09 m1a2st

@m1a2st Could you please rebase code to trigger QA again?

chia7712 avatar Sep 04 '24 17:09 chia7712

@m1a2st Could you please increase the timeout from 30 secs to 60 secs? It seems to me 30 secs is too small. Also, please merge trunk to run CI again

chia7712 avatar Sep 09 '24 12:09 chia7712

#17117 is merged now, so @m1a2st could you please rebase code?

chia7712 avatar Sep 17 '24 20:09 chia7712

@m1a2st Could you please check the failed tests? It seems no cluster tests have timeout error, so maybe we can keep this "60 seconds" WDYT?

chia7712 avatar Sep 19 '24 16:09 chia7712

Hello @chia7712, The fail tests are:

  • OffsetsApiIntegrationTest#testAlterSinkConnectorOffsetsDifferentKafkaClusterTargeted the reason is can't met condition in 30s
  • PlaintextConsumerSubscriptionTest the reason is Failed to close kafka consumer

I think we could keep the 60s timeout for cluster test.

m1a2st avatar Sep 19 '24 17:09 m1a2st

@m1a2st Is the PR title accurate? Does this add a global timeout for any test (@Test) or just our cluster tests?

mumrah avatar Sep 19 '24 17:09 mumrah

@mumrah, Thanks for your reminder, we only add timeout for cluster tests in this PR, I should modify the title.

m1a2st avatar Sep 19 '24 17:09 m1a2st

@m1a2st Do you forgot to add ClusterTest and ClusterTests to the title?

chia7712 avatar Sep 19 '24 17:09 chia7712

I consider the "@TestTemplate" is a symbol for these annotations ClusterTest, ClusterTemplate, ClusterTests, but I will fix the title more clearly.

m1a2st avatar Sep 19 '24 17:09 m1a2st