OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[AUTOCUT] Gradle Check Flaky Test Report for ReactorNetty4StreamingStressIT

Open opensearch-ci-bot opened this issue 1 year ago • 10 comments

Flaky Test Report for ReactorNetty4StreamingStressIT

Noticed the ReactorNetty4StreamingStressIT has some flaky, failing tests that failed during post-merge actions.

Details

Git Reference Merged Pull Request Build Details Test Name
087e4735fbd4644957df29fc9cab074bcaafefca 17857 55995 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
0d1bb9b1f1a86815f19cb867b0ddab8e1e9d31df 18035 56909 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
10fb8527e64dabbdc0a50ba0aae10ff30faecb8f 17631 55237 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
115de22102b68e62c0d6f818c4e083c59008c72a 17753 55759 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
18a3b75fb14217d39700fba367617d37f723293d 17796 56046 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
26beb0f9178543b80d8f2692a280d2bba436ce35 17996 56710 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
374ad774dc8f443f400e0f059e83d672066c8a33 17844 55886 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
5fb4e6951b6c80db5c9c45398ed3704ac4092ba3 17447 56255 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
693c7884e2fae72f29b3d484f001f8e210195357 17921 56337 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
6c0a95b9e1658e3ecb7cabd0cde183c40902f144 17605 54623 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
cd8fa4f14e713d5448c0779677b7f38f76c5dc42 17887 56079 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
d29e95c0dbaf5716d128e0177e8151bba7dc959e 17882 56063 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
e6ffc62a6bc01f504d13fcf924a1061f57148b9e 17609 54672 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
ebd743a50cd7162f1552568c367b60dea077774e 17642 54804 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
bd2119f782ece1fd3f477fd1613077bf1737e986 18468 58962 org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
d64baa6808a14fa021b16972459257b43ac6b7da 15637 47004 org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
fae145307c3c92cc77d4a1e2475b0069953b25c3 18116 57382 org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest

The other pull requests, besides those involved in post-merge actions, that contain failing tests with the ReactorNetty4StreamingStressIT class are:

For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard.

opensearch-ci-bot avatar Sep 06 '24 18:09 opensearch-ci-bot

Closing, the test suite timeout:

java.lang.Exception: Test abandoned because suite timeout was reached.
	at __randomizedtesting.SeedInfo.seed([49B56BB27F82E2AA]:0)

reta avatar Mar 18 '25 02:03 reta

@reta Unfortunately this failed again on PR #18060 which did contain the change from #18008:

REPRODUCE WITH: ./gradlew ':plugins:transport-reactor-netty4:javaRestTest' --tests "org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest" -Dtests.seed=D6B2DB4CE5B44876 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-US -Dtests.timezone=Etc/GMT-7 -Druntime.java=21

ReactorNetty4StreamingStressIT > testCloseClientStreamingRequest FAILED
    java.lang.AssertionError: VerifySubscriber timed out on reactor.core.publisher.FluxMap$MapSubscriber@64f123db
        at __randomizedtesting.SeedInfo.seed([D6B2DB4CE5B44876:4FA6F76359C1B064]:0)
        at reactor.test.MessageFormatter.assertionError(MessageFormatter.java:115)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.pollTaskEventOrComplete(DefaultStepVerifierBuilder.java:1728)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.verify(DefaultStepVerifierBuilder.java:1298)
        at reactor.test.DefaultStepVerifierBuilder$DefaultStepVerifier.verify(DefaultStepVerifierBuilder.java:832)
        at org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest(ReactorNetty4StreamingStressIT.java:80)

andrross avatar Apr 24 '25 16:04 andrross

@reta Unfortunately this failed again on PR #18060 which did contain the change from #18008:

REPRODUCE WITH: ./gradlew ':plugins:transport-reactor-netty4:javaRestTest' --tests "org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest" -Dtests.seed=D6B2DB4CE5B44876 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-US -Dtests.timezone=Etc/GMT-7 -Druntime.java=21

ReactorNetty4StreamingStressIT > testCloseClientStreamingRequest FAILED
    java.lang.AssertionError: VerifySubscriber timed out on reactor.core.publisher.FluxMap$MapSubscriber@64f123db
        at __randomizedtesting.SeedInfo.seed([D6B2DB4CE5B44876:4FA6F76359C1B064]:0)
        at reactor.test.MessageFormatter.assertionError(MessageFormatter.java:115)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.pollTaskEventOrComplete(DefaultStepVerifierBuilder.java:1728)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.verify(DefaultStepVerifierBuilder.java:1298)
        at reactor.test.DefaultStepVerifierBuilder$DefaultStepVerifier.verify(DefaultStepVerifierBuilder.java:832)
        at org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest(ReactorNetty4StreamingStressIT.java:80)

Got it, thanks @andrross , I will take a look shortly, sorry about that

reta avatar Apr 24 '25 17:04 reta

I spent a few minutes looking into this, but couldn't figure out a fix. This is the relevant code:

https://github.com/opensearch-project/OpenSearch/blob/473665fa1c8a59a42c87a7182872fb47e0a9f439/plugins/transport-reactor-netty4/src/javaRestTest/java/org/opensearch/rest/ReactorNetty4StreamingStressIT.java#L69-L80

When it fails it seems to match the first "onNext" match, then it closes the client (which takes 5 seconds due to the graceful shutdown of the backing executor), then it advances the time on the scheduler, but then it never receives the expected error. It will then time out after 10 seconds and fail the test.

andrross avatar Apr 29 '25 14:04 andrross

When it fails it seems to match the first "onNext" match, then it closes the client (which takes 5 seconds due to the graceful shutdown of the backing executor), then it advances the time on the scheduler, but then it never receives the expected error. It will then time out after 10 seconds and fail the test.

Yeah, the logic seems to be sound but still not stable - I will be looking, sorry it is taking a bit longer

reta avatar Apr 30 '25 12:04 reta

@reta I had trouble reproducing this, but I can get it to fail in the same way by changing the client close call to be client.close(CloseMode.IMMEDIATE) so that it does not do the graceful shutdown. Not sure if that's helpful.

andrross avatar Apr 30 '25 17:04 andrross

New failure here: https://build.ci.opensearch.org/job/gradle-check/58016/

REPRODUCE WITH: ./gradlew ':plugins:transport-reactor-netty4:javaRestTest' --tests "org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest" -Dtests.seed=9151F2D61088B7C9 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=de-LU -Dtests.timezone=US/Pacific -Druntime.java=21

ReactorNetty4StreamingStressIT > testCloseClientStreamingRequest FAILED
    java.lang.AssertionError: expectation "expectNextMatches" failed (expected: onNext(); actual: onError(java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 5000ms in 'flatMapMany' (and no fallback has been configured)))
        at __randomizedtesting.SeedInfo.seed([9151F2D61088B7C9:845DEF9ACFD4FDB]:0)
        at reactor.test.MessageFormatter.assertionError(MessageFormatter.java:115)
        at reactor.test.MessageFormatter.failPrefix(MessageFormatter.java:104)
        at reactor.test.MessageFormatter.fail(MessageFormatter.java:73)
        at reactor.test.MessageFormatter.failOptional(MessageFormatter.java:88)
        at reactor.test.DefaultStepVerifierBuilder.lambda$expectNextMatches$11(DefaultStepVerifierBuilder.java:556)
        at reactor.test.DefaultStepVerifierBuilder$SignalEvent.test(DefaultStepVerifierBuilder.java:2289)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.onSignal(DefaultStepVerifierBuilder.java:1529)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.onExpectation(DefaultStepVerifierBuilder.java:1477)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.onError(DefaultStepVerifierBuilder.java:1129)
        at reactor.core.publisher.FluxMap$MapSubscriber.onError(FluxMap.java:134)
        at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
        at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:296)
        at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:281)
        at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:420)
        at reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onNext(FluxOnErrorReturn.java:162)
        at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270)
        at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285)
        at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
        at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
        at java.****/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.****/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.****/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.****/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.****/java.lang.Thread.run(Thread.java:1583)

andrross avatar May 13 '25 21:05 andrross

New failure here: https://build.ci.opensearch.org/job/gradle-check/58016/

Thanks @andrross , looking into it

reta avatar May 14 '25 00:05 reta

Latest failure from a commit on main that did contain the most recent fix: https://build.ci.opensearch.org/job/gradle-check/58962/

REPRODUCE WITH: ./gradlew ':plugins:transport-reactor-netty4:javaRestTest' --tests "org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest" -Dtests.seed=A4C86660C6C07D13 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-UY -Dtests.timezone=Etc/GMT+3 -Druntime.java=21

ReactorNetty4StreamingStressIT > testCloseClientStreamingRequest FAILED
    java.lang.AssertionError: VerifySubscriber timed out on reactor.core.publisher.FluxMap$MapSubscriber@44d16844
        at __randomizedtesting.SeedInfo.seed([A4C86660C6C07D13:3DDC4A4F7AB58501]:0)
        at reactor.test.MessageFormatter.assertionError(MessageFormatter.java:115)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.pollTaskEventOrComplete(DefaultStepVerifierBuilder.java:1728)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.verify(DefaultStepVerifierBuilder.java:1298)
        at reactor.test.DefaultStepVerifierBuilder$DefaultStepVerifier.verify(DefaultStepVerifierBuilder.java:832)
        at org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest(ReactorNetty4StreamingStressIT.java:91)

andrross avatar Jun 09 '25 20:06 andrross

Latest failure from a commit on main that did contain the most recent fix: https://build.ci.opensearch.org/job/gradle-check/58962/

Thanks @andrross , immortal test flakyness, will take a look shortly

reta avatar Jun 09 '25 22:06 reta