OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[BUG] StableClusterManagerDisruptionIT.testStaleClusterManagerNotHijackingMajority (Random Test Failure)

Open CEHENKLE opened this issue 3 years ago • 7 comments

Describe the bug Random Test Failure. Please dig in, and figure out what went wrong :(

https://ci.opensearch.org/logs/ci/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/gradle_check_1066_reports.zip

CEHENKLE avatar Nov 16 '21 18:11 CEHENKLE

Add more information:

https://ci.opensearch.org/logs/ci/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/gradle_check_1066.log

> Task :server:internalClusterTest

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.discovery.StableMasterDisruptionIT.testStaleMasterNotHijackingMajority" -Dtests.seed=69CC1732A5C19596 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=sv -Dtests.timezone=America/Mexico_City -Druntime.java=17

org.opensearch.discovery.StableMasterDisruptionIT > testStaleMasterNotHijackingMajority FAILED
    java.lang.AssertionError: node_t2: [Tuple [v1=node_t1, v2=null]]
        at __randomizedtesting.SeedInfo.seed([69CC1732A5C19596:36CA5A3D841A4A9A]:0)
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.opensearch.discovery.StableMasterDisruptionIT.lambda$testStaleMasterNotHijackingMajority$5(StableMasterDisruptionIT.java:253)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1048)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1021)
        at org.opensearch.discovery.StableMasterDisruptionIT.testStaleMasterNotHijackingMajority(StableMasterDisruptionIT.java:250)

tlfeng avatar Mar 13 '22 02:03 tlfeng

https://github.com/opensearch-project/OpenSearch/pull/2541#issuecomment-1074479459

saratvemulapalli avatar Mar 21 '22 22:03 saratvemulapalli

Test renamed following new naming convention of cluster manager instead of master node.

Poojita-Raj avatar Nov 12 '22 00:11 Poojita-Raj

One more occurrence https://github.com/opensearch-project/OpenSearch/pull/6838#issuecomment-1484167698

dreamer-89 avatar Mar 26 '23 17:03 dreamer-89

Checking

rahulkarajgikar avatar Apr 23 '24 05:04 rahulkarajgikar

Ran 5000 iterations of the test locally and did not see any failures:

 $ ./gradlew ':server:internalClusterTest' --tests "org.opensearch.discovery.StableClusterManagerDisruptionIT.testStaleClusterManagerNotHijackingMajority" -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=sv -Dtests.timezone=America/Mexico_City -Druntime.java=17 -Dtests.iters=5000 -Dtests.timeoutSuite=180000000!
Starting a Gradle Daemon, 1 busy Daemon could not be reused, use --status for details

> Configure project :
========================= WARNING =========================
         Backwards compatibility tests are disabled!
See https://github.com/opensearch-project/OpenSearch/issues/4173
===========================================================
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 8.4
  OS Info               : Mac OS X 14.3.1 (aarch64)
  Runtime JDK Version   : 17 (Amazon Corretto JDK)
  Runtime java.home     : /Library/Java/JavaVirtualMachines/amazon-corretto-17.jdk/Contents/Home
  Gradle JDK Version    : 21 (Amazon Corretto JDK)
  Gradle java.home      : /Library/Java/JavaVirtualMachines/amazon-corretto-21.jdk/Contents/Home
  Random Testing Seed   : 9F886D8E98DA3AB1
  In FIPS 140 mode      : false
=======================================
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.BootstrapForTesting (file:/Users/karajgik/workplace/OpenSearch_karajgik/OpenSearch/test/framework/build/distributions/framework-3.0.0-SNAPSHOT.jar)
WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.BootstrapForTesting
WARNING: System::setSecurityManager will be removed in a future release
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.gradle.api.internal.tasks.testing.worker.TestWorker (file:/Users/karajgik/.gradle/wrapper/dists/gradle-8.4-all/56r6xik2f6skrm47et0ibifug/gradle-8.4/lib/plugins/gradle-testing-base-8.4.jar)
WARNING: Please consider reporting this to the maintainers of org.gradle.api.internal.tasks.testing.worker.TestWorker
WARNING: System::setSecurityManager will be removed in a future release

BUILD SUCCESSFUL in 17h 59m 55s
55 actionable tasks: 1 executed, 54 up-to-date

rahulkarajgikar avatar Apr 30 '24 06:04 rahulkarajgikar

Test sets cluster publish timeout to 1s. Was able to reproduce only when setting cluster publish timeout to 10ms.

Although was not able to reproduce the error with default values, will raise PR to increase cluster publish timeout to 2s in the test to get rid of flakiness

rahulkarajgikar avatar Apr 30 '24 06:04 rahulkarajgikar