FlakyIT: ITHighAvailabilityTest
The integration test ITHighAvailabilityTest failed in this build:
[ERROR] Failures:
[ERROR] ITHighAvailabilityTest.testCoordinatorCluster:207 » ISE Max number of retries[...
Details:
2022-06-14T18:33:50,356 INFO [main] org.apache.druid.testing.utils.DruidClusterAdminClient - 307 Temporary Redirect
2022-06-14T18:33:50,356 INFO [main] org.apache.druid.testing.utils.ITRetryUtil - Trying attempt[0/240]...
2022-06-14T18:33:50,358 WARN [HttpClient-Netty-Worker-14] org.apache.druid.java.util.http.client.pool.ResourcePool - Resource at key[http://127.0.0.1:8590] was returned multiple times?
2022-06-14T18:33:50,358 ERROR [main] org.apache.druid.testing.utils.DruidClusterAdminClient - Error while waiting for [http://127.0.0.1:8590] to be ready
java.util.concurrent.ExecutionException: java.io.IOException: Connection reset by peer
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-16.0.1.jar:?]
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
at org.apache.druid.testing.utils.DruidClusterAdminClient.lambda$waitUntilInstanceReady$1(DruidClusterAdminClient.java:268) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
at org.apache.druid.testing.utils.ITRetryUtil.retryUntil(ITRetryUtil.java:61) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
at org.apache.druid.testing.utils.ITRetryUtil.retryUntilTrue(ITRetryUtil.java:39) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
at org.apache.druid.testing.utils.DruidClusterAdminClient.waitUntilInstanceReady(DruidClusterAdminClient.java:262) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
at org.apache.druid.testing.utils.DruidClusterAdminClient.waitUntilOverlordTwoReady(DruidClusterAdminClient.java:140) ~[druid-integration-tests-0.24.0-SNAPSHOT.jar:0.24.0-SNAPSHOT]
at org.apache.druid.tests.leadership.ITHighAvailabilityTest.lambda$swapLeadersAndWait$7(ITHighAvailabilityTest.java:263) ~[test-classes/:?]
at org.apache.druid.tests.leadership.ITHighAvailabilityTest.swapLeadersAndWait(ITHighAvailabilityTest.java:266) ~[test-classes/:?]
at org.apache.druid.tests.leadership.ITHighAvailabilityTest.testLeadershipChanges(ITHighAvailabilityTest.java:125) ~[test-classes/:?]
This PR did change this particular test case, but in a different test function. Some things to note:
- This test passed on a previous run for this PR. The change that triggered the re-run was trivial: a change in a documentation file.
- The test failed on retry 0 of 240: somehow the retry mechanism (which is generally over-aggressive) didn't kick in this time, yet the failure is that the number of retries was exceeded.
- There is a 307 redirect error in the log. Perhaps the tests don't handle the transient case in which a redirect occurs?
- Perhaps unrelated, but there is an entry for "Resource at key[http://127.0.0.1:8590] was returned multiple times?"
This particular test has been redone in the "new IT" PR, but we're stuck with the old version in the present PR.
Failed again in this build.
There is also this error
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.