druid icon indicating copy to clipboard operation
druid copied to clipboard

FlakyIT: ITHighAvailabilityTest

Open paul-rogers opened this issue 3 years ago • 2 comments

The integration test ITHighAvailabilityTest failed in this build:

[ERROR] Failures: 
[ERROR]   ITHighAvailabilityTest.testCoordinatorCluster:207 » ISE Max number of retries[...

Details:

2022-06-14T18:33:50,356 INFO [main] org.apache.druid.testing.utils.DruidClusterAdminClient - 307 Temporary Redirect 
2022-06-14T18:33:50,356 INFO [main] org.apache.druid.testing.utils.ITRetryUtil - Trying attempt[0/240]...
2022-06-14T18:33:50,358 WARN [HttpClient-Netty-Worker-14] org.apache.druid.java.util.http.client.pool.ResourcePool - Resource at key[http://127.0.0.1:8590] was returned multiple times?
2022-06-14T18:33:50,358 ERROR [main] org.apache.druid.testing.utils.DruidClusterAdminClient - Error while waiting for [http://127.0.0.1:8590] to be ready
java.util.concurrent.ExecutionException: java.io.IOException: Connection reset by peer
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-16.0.1.jar:?]
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
	at org.apache.druid.testing.utils.DruidClusterAdminClient.lambda$waitUntilInstanceReady$1(DruidClusterAdminClient.java:268) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
	at org.apache.druid.testing.utils.ITRetryUtil.retryUntil(ITRetryUtil.java:61) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
	at org.apache.druid.testing.utils.ITRetryUtil.retryUntilTrue(ITRetryUtil.java:39) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
	at org.apache.druid.testing.utils.DruidClusterAdminClient.waitUntilInstanceReady(DruidClusterAdminClient.java:262) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
	at org.apache.druid.testing.utils.DruidClusterAdminClient.waitUntilOverlordTwoReady(DruidClusterAdminClient.java:140) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
	at org.apache.druid.tests.leadership.ITHighAvailabilityTest.lambda$swapLeadersAndWait$7(ITHighAvailabilityTest.java:263) ~[test-classes/:?]
	at org.apache.druid.tests.leadership.ITHighAvailabilityTest.swapLeadersAndWait(ITHighAvailabilityTest.java:266) ~[test-classes/:?]
	at org.apache.druid.tests.leadership.ITHighAvailabilityTest.testLeadershipChanges(ITHighAvailabilityTest.java:125) ~[test-classes/:?]

This PR did change this particular test case, but in a different test function. Some things to note:

  • This test passed on a previous run for this PR. The change that triggered the re-run was trivial: a change in a documentation file.
  • The test failed on retry 0 of 240: somehow the retry mechanism (which is generally over-aggressive) didn't kick in this time, yet the failure is that the number of retries was exceeded.
  • There is a 307 redirect error in the log. Perhaps the tests don't handle the transient case in which a redirect occurs?
  • Perhaps unrelated, but there is an entry for "Resource at key[http://127.0.0.1:8590] was returned multiple times?"

This particular test has been redone in the "new IT" PR, but we're stuck with the old version in the present PR.

paul-rogers avatar Jun 14 '22 20:06 paul-rogers

Failed again in this build.

paul-rogers avatar Jun 22 '22 17:06 paul-rogers

There is also this error

599166320 avatar Oct 08 '22 14:10 599166320

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Dec 31 '23 00:12 github-actions[bot]

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

github-actions[bot] avatar Jan 29 '24 00:01 github-actions[bot]