OpenSearch
OpenSearch copied to clipboard
[AUTOCUT] Gradle Check Flaky Test Report for MinimumClusterManagerNodesIT
Flaky Test Report for MinimumClusterManagerNodesIT
Noticed the MinimumClusterManagerNodesIT has some flaky, failing tests that failed during post-merge actions.
Details
| Git Reference | Merged Pull Request | Build Details | Test Name |
|---|---|---|---|
| 6049587461bb001dcae616c76399173817ce81ed | 14040 | 40080 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlockorg.opensearch.cluster.MinimumClusterManagerNodesIT.classMethod |
| 8cf7f9259e69c90ed42763e17c3e7896f8a41c5c | 16033 | 48311 | org.opensearch.cluster.MinimumClusterManagerNodesIT.classMethodorg.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 9675c4f6ec0d412993ef361bce44a8b789bff27b | 14465 | 41398 | org.opensearch.cluster.MinimumClusterManagerNodesIT.classMethodorg.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 0d780b68e900bd99319b2c4ea3b7d567f8b121e5 | 15121 | 44058 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 2eb148cdffd32058c40d6703cbb4a06eb2a2cba3 | 15677 | 47308 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 3fa710b1ea46eee41130dfab06ccf7cbfb27b8e4 | 15648 | 46708 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 50f411e733ad90b54e7dfa69e85702b7e24ebe49 | 15582 | 46459 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 725ed36e85bba5b99ee34fb4f0813409247106c5 | 15783 | 47574 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 96fdbfdb87c41235e99697e04b9a0cc0adefb7bc | 16385 | 49760 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 9cd2635ec3495a7222bde2137281f78949307f49 | 15483 | 45625 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| a05d6d1a0f44920fff93080942f7f5a8d3b10bb9 | 15905 | 47730 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| b35690c886f42d2ca01fa3081e80cb4ba4aa19d9 | 14795 | 42953 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| c801270b150083c0f15f8c1f70e3c6d8f731cac0 | 15660 | 46762 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| d56d8c88e07ae416d41197b05103ea2dba393967 | 14489 | 41572 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| fc1bf2c9c7b9858fe60caa3ed7ef09bbd0b30c4f | 15759 | 47451 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 0fc94ca3aa9b12558d898ff05b479b360f71ae0f | 13799 | 39263 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 130500218a794f15df522c3ba5a31acbc77209e4 | 14851 | 43091 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 3ef34558d3884f8a055be8f04c6d98da3428dcb9 | 16388 | 49737 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 56d0b76ac4c636d473177f4f12e854ce1fa6aa64 | 14401 | 41153 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| a021bf98e9e4bbc9fd36b694b4053d693dcedc22 | 16325 | 49454 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| acc46316550ee203851d5c622d3b4724646d3f3e | 14587 | 42139 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| afa479b2c5ce9a22220bf2f4de49ae4ca69c3bc7 | 14748 | 42455 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| c89a17cecaf8348d936cd42d3000c1a1fa7cf120 | 13888 | 40047 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 017f7d461ca865da57948719db0b58f40286427c | 15704 | 47021 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 107f0ce6a3c04ea4a759a8cf980e4d23c88ab1b8 | 15867 | 47672 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 1386a9b902c4af0e3cb88a6e7e16861970415b76 | 13930 | 39885 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 2e13e9cb5b3507e9e7e85b73012c7ccd84b6844f | 14107 | 40782 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 3a38a6c86e34c5abbb0eb95d919e585e2af78feb | 14365 | 41154 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 43e7597cdc5ba2c1852ec1796628f948633f0c57 | 16146 | 48697 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 591940911c052cf977812e3e0948b2ad5c922329 | 13945 | 39654 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 67eceaa75d788e20a1e941324210e164939a0991 | 15617 | 46561 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 90148942a56fa6a4840ad2afed195071f2d3c8e6 | 15401 | 45300 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| a12e3e6212c6103b64346c2c0a3859e467751337 | 16051 | 48345 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| a99fe302966cbff576c68bc2cc22dd38bab70000 | 16074 | 48420 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| dbdc1517dd1bc885d9204aff75d2e4c9ec13eee6 | 15589 | 46329 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| f85a58f64e5aaba76eb519e309881f288aff8fa6 | 14684 | 43162 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| fabf9bd596386dd745685a23b6a1dc52d0f84b7b | 15293 | 44791 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 01c5e5642b7450bba2f3a21acdf8cf13539f65eb | 15750 | 47339 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 03b1306b3cf2f4a37634ea6aca89512803541de6 | 15019 | 43633 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 06698dd292ecd74e86c4ffbb26270ebeabd7ce31 | 14922 | 43325 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 08c19327417afad9003236725920efc8a3abfa9b | 16102 | 48602 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 0c2ff039890c9e891da068ba401a7a77683c4a5b | 14230 | 41005 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 0eb2ec0bdbdb4f9f1f027ed108755fbae0d232f1 | 16348 | 49555 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 0ff0439dce988344c76ec0d68643bef528c652b6 | 16306 | 49410 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 11f8d79a96494fc6031894e28008da57bc3fe153 | 14716 | 42338 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 1562100eeaa9d8e108c6bc21a4030687d729fa1c | 15400 | 45193 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 1bee506f8a6695c235d749ea90676841c3121e3c | 15227 | 44685 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 201c673e980016ebc3c67e85b9a4d0fa684460b0 | 14458 | 41410 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 234c4da5d6e679e718c93e303f0b8bf65fbd7d5e | 16026 | 48205 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 23d1c7a55a63250b962c1fad4e6fb962fdd156cc | 16282 | 49484 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 36cb9ebb61f2ac5d0350cffb0cd381a2488d7cd0 | 16275 | 49252 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 3a1be63f3445bb38bea5898742a2b195c1c26251 | 14639 | 41976 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 3ddb199a77b73364cce725a8dcf594ab572b3d2a | 15586 | 46999 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 4038a3c1e4e6a43460be49f5205e745133bea4c6 | 14074 | 40854 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 468f120141b6b472a143034fe59c12fed06b4a35 | 15724 | 47162 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 4c7d94cece85b3dd1a6de2df0efd22914c1fb9a5 | 14839 | 42902 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 57a597fe2d68f283790a3658d38f7ceb39e25c72 | 15494 | 45869 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 5bb2e2851d7a2986e59548826c8d935264f523e4 | 15200 | 44343 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 6021bcaa68ef05dc9435ea9e3d8b2eb2aa6e8fad | 16280 | 49278 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 64383dd84bc2ce1370febaec9a3c3c8dea0cf81a | 14561 | 41702 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 71d122b9013f72c5e28a9c3240f4c7f9491aecf2 | 15554 | 46063 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 7650e6412056f0b06e069ae6b2936f9ea2da4a7f | 14345 | 41056 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 7a58f5e32fc4ad8c48cb401c4b516fb4cd09856f | 16193 | 48947 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 7dbaf25aa1b1b33b09ef0eeb4df92f41225fd0fc | 16176 | 48836 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 7e7e77504c6a0f20b3ba49786057ab906b9ea880 | 14864 | 43002 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 802f2e6e4b21f27ddc6c01e7fc6f6cdcd69138d3 | 14424 | 41253 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 887698d22fbca28f29c8ffc0f635228ac209d6a1 | 15132 | 44111 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 8e32ed736372aa90db4c0ce3b85888b7b473a337 | 14394 | 41336 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 903784b0afe756ee9f3e5eed7120f2289b207682 | 14414 | 41239 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 913013bd5c6b43d8337a97a7753bc2f10f36eae4 | 13948 | 39666 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| 931339e38be8f29281501a5ac8f0dddf2aa2232d | 16311 | 49422 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| a0a7098fda852eb18b0aa7d7aea23c6abdb497e7 | 14884 | 43148 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| a17aea599e56c18c07767bf50d5b9603ccf2e315 | 14710 | 42244 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| a968790ed5f4e47f96271483246842989520411e | 15932 | 47815 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| ae22e3ff32ef15a6af302c50872f1fa0e8e140fe | 16065 | 48417 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| afeddc228ba7791a549fb7c6ef94349d432c0824 | 14037 | 40910 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| b8c78196438897132f6819460ebb7d4222b39297 | 12782 | 41391 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| bb013dadc797bd3349a630444d59b6e9b6b96429 | 13717 | 39669 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| bde48a7b925b6cb20099c9d31023127288d3fb02 | 14133 | 40532 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| bf4367877eab27dff05a74d683d14e820130172d | 13809 | 39614 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| c49eca4061d3af9af77a3eacd28043200343ba98 | 13721 | 40576 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| cad81b0e468164f5d58aaa83ca4b3d2f462c4990 | 15216 | 45855 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| d1cd7a2b8ba24a5e5ef3278315efd589e8c6eeee | 15512 | 45861 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| d2bc9fc3daaaa33273bace58c4a94d2ae3e7be5c | 15656 | 46780 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| d5c4081100ebe30e9dc84bb9d86003183b489bfd | 16130 | 48628 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| d7b011612014a78283c56425d493550b64ad2b5b | 16250 | 49186 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| e1a632fd8a88b0fad3d11708dd389c88eb0eeaa3 | 14340 | 40973 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| e67ced73226453d5a5504c78f3b7d5ae90b4914e | 13784 | 39156 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| edcbfd49f0e047bf34fc88c9aeca4a20fde5ee45 | 14923 | 43290 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| eeb2f3997bb84f33f13b848e125051ecf2c2a1c7 | 16048 | 48315 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| f9d15df3a14b4ae32aeda3931867fc72dfd990c2 | 15715 | 47146 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
| fef20032943378c02f8f3424865395058989e186 | 15181 | 44308 | org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock |
The other pull requests, besides those involved in post-merge actions, that contain failing tests with the MinimumClusterManagerNodesIT class are:
- 14119
- 13655
- 15774
- 15295
- 13821
- 16321
- 14774
- 14696
- 14853
- 15448
- 15569
- 15890
- 15943
- 13772
- 13788
- 14652
- 15145
- 15343
- 13957
- 14069
- 14135
- 14977
- 15017
- 15239
- 15291
- 15327
- 15346
- 15372
- 15381
- 15570
- 15622
- 16292
- 16361
- 13703
- 14019
- 14115
- 14223
- 14266
- 14383
- 14479
- 14614
- 14630
- 14708
- 14948
- 15177
- 15210
- 15326
- 15410
- 15430
- 15614
- 15662
- 15681
- 15951
- 15980
- 16082
- 16103
- 16212
- 16278
- 16316
- 16392
- 12439
- 13636
- 13813
- 14166
- 14200
- 14399
- 14409
- 14515
- 14617
- 14738
- 14991
- 15370
- 15416
- 15533
- 15613
- 15621
- 15916
- 16091
- 16121
- 16233
- 13374
- 13498
- 13590
- 13708
- 13785
- 13801
- 13807
- 13865
- 13897
- 13924
- 14025
- 14064
- 14167
- 14391
- 14432
- 14533
- 14565
- 14574
- 14641
- 14655
- 14677
- 14735
- 14750
- 14759
- 14885
- 14925
- 14963
- 14972
- 14993
- 15079
- 15153
- 15278
- 15289
- 15290
- 15432
- 15442
- 15521
- 15526
- 15542
- 15573
- 15664
- 15667
- 15668
- 15705
- 15939
- 15972
- 15973
- 16143
- 16201
- 16206
- 16239
- 11573
- 12016
- 13131
- 13172
- 13315
- 13474
- 13525
- 13574
- 13632
- 13637
- 13678
- 13681
- 13684
- 13722
- 13759
- 13810
- 13885
- 13887
- 13895
- 13934
- 13944
- 13969
- 13997
- 14026
- 14095
- 14111
- 14140
- 14151
- 14158
- 14206
- 14238
- 14261
- 14369
- 14371
- 14397
- 14400
- 14402
- 14411
- 14415
- 14426
- 14437
- 14440
- 14454
- 14487
- 14512
- 14528
- 14540
- 14612
- 14613
- 14625
- 14635
- 14651
- 14659
- 14670
- 14673
- 14703
- 14715
- 14717
- 14718
- 14725
- 14761
- 14792
- 14814
- 14817
- 14827
- 14855
- 14874
- 14876
- 14896
- 14905
- 14913
- 14949
- 14962
- 14967
- 14994
- 15012
- 15045
- 15188
- 15218
- 15230
- 15256
- 15305
- 15312
- 15320
- 15330
- 15336
- 15357
- 15374
- 15382
- 15386
- 15395
- 15398
- 15409
- 15426
- 15428
- 15454
- 15465
- 15471
- 15476
- 15496
- 15508
- 15513
- 15523
- 15530
- 15565
- 15574
- 15578
- 15587
- 15591
- 15595
- 15598
- 15611
- 15615
- 15627
- 15636
- 15640
- 15643
- 15649
- 15654
- 15659
- 15671
- 15672
- 15679
- 15689
- 15690
- 15708
- 15740
- 15755
- 15765
- 15777
- 15778
- 15859
- 15894
- 15940
- 15941
- 15949
- 15975
- 15978
- 15986
- 15997
- 16009
- 16021
- 16024
- 16038
- 16042
- 16080
- 16150
- 16154
- 16161
- 16168
- 16181
- 16191
- 16197
- 16204
- 16215
- 16219
- 16238
- 16242
- 16260
- 16267
- 16291
- 16314
- 16331
- 16347
- 16352
- 16353
For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard.
Adding the Storage:Remote label to this one because I believe it has been traced back to a commit related to that feature. From the original issue:
I believe I have traced this back to the commit that introduced the flakiness: 9119b6dc20ea11d95a399c68505f1d858b78e30e (#9105)
The following command will reliably reproduce the failure for me:
./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock" -Dtests.iters=100
If I select the commit immediately preceding 9119b6dc20e then it does not reproduce.
This is a bit concerning because the commit in question is related to the remote store feature but MinimumClusterManagerNodesIT does not do anything related to remote store, so it is possible there is a significant regression here.
Mostly just adding some debugging logging statements.
- We start out with 3 nodes,
[node_t0, node_t1, node_t2] - We find the set of non-CM nodes,
[node_t1, node_t0] - We shut down the non-CM nodes, leaving
[node_t2] - We use the local path of the two nodes shut down to start up new nodes, they have the same UUID
- Most of the time when the test passes, the new nodes are renamed,
node_t0->node_t3andnode_t1->node_t4. - When the test fails, it's consistently because the (formerly CM) node still thinks it's in a cluster with
node_t0andnode_t1and its cluster state version is 2 versions behind the other two nodes. - The other two (new) nodes think that
node_t2is the cluster manager but it hasn't caught up yet. - The 2nd cluster state update is likely the cluster manager assignment, so the root cause is probably the first cluster state update that is failing on the (formerly cluster manager) node:
java.lang.AssertionError: a started primary with non-pending operation term must be in primary mode [test][1], node[ZdcgPV1JSmut1DojEIhCEw], [P], s[STARTED], a[id=Yl7dClDeQ0Ox4vlafvVO_A]
at __randomizedtesting.SeedInfo.seed([D54CD0A4D377FB88]:0)
at org.opensearch.index.shard.IndexShard.updateShardState(IndexShard.java:840)
at org.opensearch.indices.cluster.IndicesClusterStateService.updateShard(IndicesClusterStateService.java:712)
at org.opensearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:651)
at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:294)
at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:626)
at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:612)
at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:580)
at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:503)
at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:205)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:923)
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Adding a flush after the 2 nodes are randomly dropped seems effective in preventing the flakiness, but also takes a long time
client().admin().indices().prepareFlush().execute().actionGet();
Adding a refresh() fails at this point because there is no cluster manager.
Placing a refresh() between the two node terminations seems to reduce, but not eliminate, the flakiness.
I'm about at the limit of what debug logging can tell me, but I'd suggest someone with knowledge of the linked PR investigate the interaction of that code with the cluster state.