OpenSearch
OpenSearch copied to clipboard
[AUTOCUT] Gradle Check Flaky Test Report for RecoveryWhileUnderLoadIT
Flaky Test Report for RecoveryWhileUnderLoadIT
Noticed the RecoveryWhileUnderLoadIT has some flaky, failing tests that failed during post-merge actions.
Details
| Git Reference | Merged Pull Request | Build Details | Test Name |
|---|---|---|---|
| d0c2e39ae05454775b8063e09a88dd5f5834c49f | 17797 | 59086 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.classMethodorg.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileRelocating {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasTest {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithDerivedSource {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWithRelocationAndDerivedSource {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoveryWithDerivedSourceEnabled {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testReplicaRecoveryWithDerivedSourceFromTranslog {p0={"cluster.indices.replication.strategy":"SEGMENT"}} |
| ec5addab82d459743c5c6bb579e6573ecd610e03 | 18500 | 59161 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.classMethodorg.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileRelocating {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasTest {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithDerivedSource {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWithRelocationAndDerivedSource {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoveryWithDerivedSourceEnabled {p0={"cluster.indices.replication.strategy":"SEGMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testReplicaRecoveryWithDerivedSourceFromTranslog {p0={"cluster.indices.replication.strategy":"SEGMENT"}} |
| b3ad02aad87370205d8bd80979b44980b64aadc6 | 18421 | 59113 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.classMethodorg.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileRelocating {p0={"cluster.indices.replication.strategy":"DOCUMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {p0={"cluster.indices.replication.strategy":"DOCUMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasTest {p0={"cluster.indices.replication.strategy":"DOCUMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithDerivedSource {p0={"cluster.indices.replication.strategy":"DOCUMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes {p0={"cluster.indices.replication.strategy":"DOCUMENT"}} |
| 7116a2c0633a425851393288d7cfa59911e10cf8 | 15138 | 45268 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.classMethodorg.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {p0={"cluster.indices.replication.strategy":"SEGMENT"}} |
| 1bb42ecfafad91528d2b869579c0e9e0fbfca130 | 14508 | 41533 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {p0={"cluster.indices.replication.strategy":"DOCUMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasTest {p0={"cluster.indices.replication.strategy":"DOCUMENT"}} |
| 528e2b0073af8c1557c528d1bdf360183ae011a4 | 17855 | 59021 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.classMethodorg.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithDerivedSource {p0={"cluster.indices.replication.strategy":"SEGMENT"}} |
| eb5035398967510165fcab4ff4664fd3e80e2cce | 15418 | 45418 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {p0={"cluster.indices.replication.strategy":"DOCUMENT"}}org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasTest {p0={"cluster.indices.replication.strategy":"DOCUMENT"}} |
| 8d3386cd1f657b0f885d3f5431769a414ff1b43b | 18043 | 57206 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {p0={"cluster.indices.replication.strategy":"DOCUMENT"}} |
| 9a3fc307d48a800f96241bb26d4c2f46790a3db3 | 18003 | 57018 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {p0={"cluster.indices.replication.strategy":"DOCUMENT"}} |
| c92b8ea8742b7ae48e5b169c02a255543c4c7b5d | 18435 | 59128 | org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithDerivedSource {p0={"cluster.indices.replication.strategy":"SEGMENT"}} |
The other pull requests, besides those involved in post-merge actions, that contain failing tests with the RecoveryWhileUnderLoadIT class are:
- 18504
- 18070
- 18509
- 18465
- 18488
- 18351
- 18375
- 18501
- 18048
- 18277
- 18405
- 18497
- 18495
- 18498
- 17439
- 18054
- 18346
- 18516
- 18099
- 18109
- 18229
- 18454
- 18467
- 18483
- 18491
- 13172
- 13637
- 13655
- 13817
- 14533
- 17718
- 17730
- 17791
- 17907
- 18006
- 18017
- 18060
- 18064
- 18068
- 18073
- 18090
- 18092
- 18101
- 18119
- 18135
- 18479
For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard.
It looks like something changed related to this test about a week ago. See the dashboard.
PR #18054 recently added test cases here. @tanik98 @shwetathareja can you take a look?
All the recovery-related ITs modified in #18054 now seem to be much more flaky:
@msfroh @rishabhmaurya What do you think? Should we revert #18054. I'm seeing quite a lot of failures.
@msfroh @rishabhmaurya What do you think? Should we revert #18054. I'm seeing quite a lot of failures.
I am in favor of reverting. There was an attempt to fix tests, but it was evidently not sufficient.
+1
Hey everyone, I've hit this flaky test twice on different gradle checks. Should I wait for a fix/PR before I run gradle check again?
The failure in testRecoverWhileUnderLoadWithDerivedSource seems to be due to mismatch of source.
There are 2 translog entries being compared:
- Directly written by the replica
- Snapshot received from Peer recovery flow.
The latter derives the source and hence can cause the assertion in the tests to fail as the structure may differ from user provided source (albeit being a congruent object). We should be able fix the test through an improved check.