kop icon indicating copy to clipboard operation
kop copied to clipboard

Flaky-test: PendingTopicFuturesTest#testExceptionalComplete

Open BewareMyPower opened this issue 3 years ago • 3 comments

PendingTopicFuturesTest is flaky. The testExceptionalComplete test method fails sporadically.

Add invocationCount = 100 and test it in my local env. There is one time it failed.

18:40:00.342 [TestNG-method=testExceptionalComplete-1:io.streamnative.pulsar.handlers.kop.PendingTopicFuturesTest@113] INFO  io.streamnative.pulsar.handlers.kop.PendingTopicFuturesTest - exceptionMessages: [error, error, error, error, error, error, error, error, error, error]
18:40:00.343 [TestNG-method=testExceptionalComplete-1:io.streamnative.pulsar.handlers.kop.PendingTopicFuturesTest@117] INFO  io.streamnative.pulsar.handlers.kop.PendingTopicFuturesTest - Changes of pending futures' count: [1, 2, 3, 4, 1, 0, 0, 0, 0, 0]


java.lang.AssertionError: Lists differ at element [4]: 5 != 1 expected [5] but found [1]
Expected :5
Actual   :1
<Click to see difference>


	at org.testng.Assert.fail(Assert.java:96)
	at org.testng.Assert.failNotEquals(Assert.java:776)
	at org.testng.Assert.assertEqualsImpl(Assert.java:137)
	at org.testng.Assert.assertEquals(Assert.java:831)
	at org.testng.Assert.assertEquals(Assert.java:796)
	at io.streamnative.pulsar.handlers.kop.PendingTopicFuturesTest.testExceptionalComplete(PendingTopicFuturesTest.java:120)
	at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:54)
	at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:44)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

BewareMyPower avatar Dec 01 '21 10:12 BewareMyPower

I looked into this a bit and it seems to be a bug in the tests. The registered listeners would race with the future completion, which means there could be cases where the future has just completed and the already registered listeners are executing, and at the same time a new listener is registered, which causes a non-zero return value for count() which is smaller than previous count() return values.

The reason why this happens in testExceptionalComplete rather than testNormalComplete is probably because while the topicFuture waits for 800 ms, testNormalComplete sleeps for 234ms between adding listeners while testExceptionalComplete sleeps for 200ms which means the race condition is more likely since it divides 800 evenly.

Does that make sense? If so I think we should probably make the test more explicit - e.g., start a future and block it, register a few listeners, make sure they queue up, unblock the future, verify the registered listener methods are executed, and then register a new listener and check it doesn't block. If that makes sense I'll open a PR.

ZiyaoWei avatar Jun 17 '22 04:06 ZiyaoWei

Yeah, I also found this reason but forgot to update. Thanks for your help!

BewareMyPower avatar Jun 20 '22 10:06 BewareMyPower

Will work on a fix then, thanks!

ZiyaoWei avatar Jun 21 '22 00:06 ZiyaoWei

Another failure: https://github.com/streamnative/kop/actions/runs/3591912498/jobs/6047012544

@Demogorgon314 Could you take over this issue since Ziyao is off?

BewareMyPower avatar Dec 01 '22 12:12 BewareMyPower

@BewareMyPower I will take over.

Demogorgon314 avatar Dec 01 '22 14:12 Demogorgon314