pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

Flaky-test: PersistentStreamingDispatcherBlockConsumerTest.testBrokerSubscriptionRecovery

Open shibd opened this issue 3 years ago • 1 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Example failure

https://github.com/apache/pulsar/runs/7739485769?check_suite_focus=true

Exception stacktrace

  Error:  Tests run: 13, Failures: 1, Errors: 0, Skipped: 9, Time elapsed: 38.276 s <<< FAILURE! - in org.apache.pulsar.broker.service.persistent.PersistentStreamingDispatcherBlockConsumerTest
  Error:  testBrokerSubscriptionRecovery(org.apache.pulsar.broker.service.persistent.PersistentStreamingDispatcherBlockConsumerTest)  Time elapsed: 1.227 s  <<< FAILURE!
  java.lang.AssertionError: expected object to not be null
  	at org.testng.Assert.fail(Assert.java:99)
  	at org.testng.Assert.assertNotNull(Assert.java:942)
  	at org.testng.Assert.assertNotNull(Assert.java:926)
  	at org.apache.pulsar.client.api.DispatcherBlockConsumerTest.testBrokerSubscriptionRecovery(DispatcherBlockConsumerTest.java:622)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
  	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
  	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
  	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:833)

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

shibd avatar Aug 09 '22 08:08 shibd

Another instance https://github.com/apache/pulsar/runs/7758231806?check_suite_focus=true

tisonkun avatar Aug 10 '22 02:08 tisonkun

I think this problem is caused by #16968 and fixed by #17018. I closed this issue until we got this exception again.

mattisonchao avatar Aug 10 '22 11:08 mattisonchao

@mattisonchao I think the issue does not resolve https://github.com/apache/pulsar/runs/7766497633?check_suite_focus=true

Just reopen it.

codelipenghui avatar Aug 10 '22 14:08 codelipenghui

A new one: https://github.com/apache/pulsar/runs/7767603985?check_suite_focus=true

codelipenghui avatar Aug 10 '22 16:08 codelipenghui

https://github.com/apache/pulsar/runs/7778850062?check_suite_focus=true

codelipenghui avatar Aug 13 '22 02:08 codelipenghui

This problem was fixed by #17143

mattisonchao avatar Aug 22 '22 01:08 mattisonchao

Another failure: https://github.com/apache/pulsar/runs/8159197059?check_suite_focus=true#step:10:824

Error:  testBrokerSubscriptionRecovery(org.apache.pulsar.broker.service.persistent.PersistentStreamingDispatcherBlockConsumerTest)  Time elapsed: 5.058 s  <<< FAILURE!
  java.lang.AssertionError: expected [true] but found [false]
  	at org.testng.Assert.fail(Assert.java:99)
  	at org.testng.Assert.failNotEquals(Assert.java:1037)
  	at org.testng.Assert.assertTrue(Assert.java:45)
  	at org.testng.Assert.assertTrue(Assert.java:55)
  	at org.apache.pulsar.client.api.DispatcherBlockConsumerTest.lambda$testBrokerSubscriptionRecovery$15(DispatcherBlockConsumerTest.java:658)
  	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
  	at org.apache.pulsar.client.api.DispatcherBlockConsumerTest.testBrokerSubscriptionRecovery(DispatcherBlockConsumerTest.java:658)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
  	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
  	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
  	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:[833](https://github.com/apache/pulsar/runs/8159197059?check_suite_focus=true#step:10:834))

RobertIndie avatar Sep 02 '22 16:09 RobertIndie

Cause of the problem:

  • Messages are received out of order. see #17418
  • There may be bugs when enabled streaming dispatch

poorbarcode avatar Sep 02 '22 17:09 poorbarcode

There are some failed checks in the PR #16003 because of this flaky test, what can I do about this issue?

shink avatar Sep 17 '22 03:09 shink

There are some failed checks in the PR https://github.com/apache/pulsar/pull/16003 because of this flaky test, what can I do about this issue?

The first thing you should do is determine if any of the failed tests were caused by the code you submitted. If not, you can comment /pulsarbot rerun-failure-checks to retry the test

poorbarcode avatar Sep 17 '22 03:09 poorbarcode

@poorbarcode Thank you for your suggestions! But the problem is, all failed checks are about this flaky test, it seems that they are not caused by changed code, but they fail again when I re-run them.

  • https://github.com/apache/pulsar/actions/runs/3071925517/jobs/4963032830
  • https://github.com/apache/pulsar/actions/runs/3071925540/jobs/4963093961

shink avatar Sep 17 '22 03:09 shink

Hi @shink

Thank you for your suggestions! But the problem is, all failed checks are about this flaky test, it seems that they are not caused by changed code, but they fail again when I re-run them.

we can discuss it at PR #16003

poorbarcode avatar Sep 17 '22 04:09 poorbarcode

another one:

  • https://github.com/poorbarcode/pulsar/actions/runs/3105874288/jobs/5032329689

poorbarcode avatar Sep 22 '22 16:09 poorbarcode