jeromq icon indicating copy to clipboard operation
jeromq copied to clipboard

Tests still flaky

Open fbacchella opened this issue 5 years ago • 6 comments

I ran all the tests in a shell loop during many hours.

It ran 120 iterations. And indeed some tests fails, even if it was a dedicated linux. So it was not only a CI problem. I ran them with openjdk 11.0.6 on a Centos 7.7 with 4 CPU, with the arguments:

mvn clean test -Pcrosscompile -Dsurefire.rerunFailingTestsCount=0

I got the following errors:

org.zeromq.auth.ZAuthTest.testBlacklistDenied :
    cause:  {'test timed out after 5000 milliseconds': 38}
    type:   {'org.junit.runners.model.TestTimedOutException': 38}
org.zeromq.ZBeaconTest.testUseBuilder :
    cause:  {None: 1}
    type:   {'java.lang.NullPointerException': 1}
org.zeromq.auth.ZAuthTest.testWhitelistDenied :
    cause:  {'test timed out after 5000 milliseconds': 22}
    type:   {'org.junit.runners.model.TestTimedOutException': 22}
org.zeromq.ZMsgTest.testClosedContext :
    cause:  {'\nExpected: is <156384765>\n     but: was <4>': 3}
    type:   {'java.lang.AssertionError': 3}
org.zeromq.auth.ZAuthTest.testPlainWithPasswordDenied :
    cause:  {'test timed out after 5000 milliseconds': 1}
    type:   {'org.junit.runners.model.TestTimedOutException': 1}
org.zeromq.TestZContext.testSeveralPendingInprocSocketsAreClosedIssue595 :
    cause:  {None: 1}
    type:   {'java.lang.NullPointerException': 1}

I also analyzed tests execution times. One test is quite surprising, org.zeromq.TestZPoller.testPollerPollout. Out of 120 ran, 105 ran for less that 0.111. The 15 other ran for about 100 seconds each.

The test zmq.proxy.ProxyTest.testProxy show the same kind of pattern: 65 tests ran for 2s, 55 ran for 42s.

I hope all of this problems are coming from tests themselves and not from Jeromq code.

fbacchella avatar May 11 '20 19:05 fbacchella

For the testClosedContext it should check ETERM || EINTR which is what the constant 4 is.

trevorbernard avatar May 11 '20 20:05 trevorbernard

I really don't like the implementation of bindToRandomPort. The Random should be private static final and it's janky. If the underlying ServerSocket is set to 0, a port number will be randomly allocated by the OS typically in the ephemeral range. Re: ZAuthTest

trevorbernard avatar May 11 '20 20:05 trevorbernard

As for bindToRandomPort, I found that binding to tcp://localhost:* and getting the last endpoint using ZMQ.ZMQ_LAST_ENDPOINT to connect the other side was more convenient and less error-prone. Much better than binding to a random port or using Utils.findOpenPort.

fredoboulo avatar May 12 '20 07:05 fredoboulo

is this closable?

trevorbernard avatar May 25 '20 15:05 trevorbernard

There is still a few random failures that shows time to time. But the situation is much better now.

fbacchella avatar May 25 '20 17:05 fbacchella

After a night-long run and 233 tests, I get the following results:

org.zeromq.ReqRepTest.testIssue532 :
    cause:  {'org.zeromq.ZMQException: Errno 4 : Interrupted function': 140, 'test timed out after 5000 milliseconds': 1}
    type:   {'java.util.concurrent.ExecutionException': 140, 'org.junit.runners.model.TestTimedOutException': 1}
org.zeromq.ReqRepTest.testWaitForeverOnSignalerIssue303 :
    cause:  {'org.zeromq.ZMQException: Errno 4 : Interrupted function': 114}
    type:   {'java.util.concurrent.ExecutionException': 114}
org.zeromq.guide.AsyncServerTest.testAsyncServer :
    cause:  {'test timed out after 10000 milliseconds': 2}
    type:   {'org.junit.runners.model.TestTimedOutException': 2}
org.zeromq.TestZStar.testNoStar :
    cause:  {None: 44}
    type:   {'java.nio.channels.ClosedSelectorException': 44}
org.zeromq.auth.ZAuthTest.testPlainWithPasswordDenied :
    cause:  {'test timed out after 5000 milliseconds': 6}
    type:   {'org.junit.runners.model.TestTimedOutException': 6}
org.zeromq.auth.ZAuthTest.testWhitelistDenied :
    cause:  {'test timed out after 5000 milliseconds': 17}
    type:   {'org.junit.runners.model.TestTimedOutException': 17}
zmq.proxy.ProxyTest.testProxy :
    cause:  {'test timed out after 10000 milliseconds': 22}
    type:   {'org.junit.runners.model.TestTimedOutException': 22}
org.zeromq.auth.ZAuthTest.testBlacklistDenied :
    cause:  {'test timed out after 5000 milliseconds': 12}
    type:   {'org.junit.runners.model.TestTimedOutException': 12}
zmq.socket.pubsub.XpubXsubTest.testXpubSub :
    cause:  {'test timed out after 5000 milliseconds': 2}
    type:   {'org.junit.runners.model.TestTimedOutException': 2}
org.zeromq.guide.EspressoTest.testEspresso :
    cause:  {'test timed out after 5000 milliseconds': 3}
    type:   {'org.junit.runners.model.TestTimedOutException': 3}
org.zeromq.ZBeaconTest.testUseBuilder :
    cause:  {'expected 500, got 488': 1, 'expected 500, got 490': 1}
    type:   {'java.lang.AssertionError': 2}
org.zeromq.TestZLoop.testZLoopWithUDP :
    cause:  {'Address already in use': 1}
    type:   {'java.net.BindException': 1}

And for test time statistics, the top worst standard deviation are:

test min max median average std
zmq.proxy.ProxyTest.testProxy 0,274 10,013 0,708 1,56228326 2,72795873
org.zeromq.guide.AsyncServerTest.testAsyncServer 0,038 10,013 6,262 6,16520172 1,63683139
org.zeromq.auth.ZAuthTest.testWhitelistDenied 0,004 5,019 0,012 0,37696996 1,30108449
org.zeromq.ParanoidPiratServerWithLazyPiratClientTest.testIssue408 32,027 34,754 32,043 32,849515 1,1704725
org.zeromq.auth.ZAuthTest.testBlacklistDenied 0,001 5,021 0,013 0,27116309 1,10507894
org.zeromq.auth.ZAuthTest.testPlainWithPasswordDenied 0,001 5,025 0,02 0,14916738 0,79139267
org.zeromq.guide.EspressoTest.testEspresso 0,016 5,02 0,114 0,21654936 0,55041531
org.zeromq.HighWatermarkTest.testReliabilityOnWatermark2 10,052 11,314 10,193 10,3937897 0,44287804
zmq.TestTermEndpoint.testUnbindWildcard 0,002 2,01 2,004 1,93509871 0,36432169
org.zeromq.ZFrameTest.testZMQClientServer 0 5,567 0,005 0,02898712 0,36359313

And the slowest are:

test min max median average std
org.zeromq.ParanoidPiratServerWithLazyPiratClientTest.testIssue408 32,027 34,754 32,043 32,849515 1,1704725
org.zeromq.PubSubTest.testUnsubscribeIssue554 19,338 19,504 19,441 19,4437897 0,01288288
org.zeromq.HighWatermarkTest.testReliabilityOnWatermark2 10,052 11,314 10,193 10,3937897 0,44287804
org.zeromq.HighWatermarkTest.testReliabilityOnWatermark 10,108 11,284 11,108 11,1135579 0,14260533
org.zeromq.guide.AsyncServerTest.testAsyncServer 0,038 10,013 6,262 6,16520172 1,63683139
zmq.proxy.ProxyTest.testProxy 0,274 10,013 0,708 1,56228326 2,72795873
org.zeromq.ZFrameTest.testZMQClientServer 0 5,567 0,005 0,02898712 0,36359313
zmq.io.mechanism.SecurityCurveTest.testCurveMechanismSecurity 4,812 5,22 4,985 4,98849785 0,0916192
org.zeromq.auth.ZAuthTest.testPlainWithPasswordDenied 0,001 5,025 0,02 0,14916738 0,79139267
org.zeromq.auth.ZAuthTest.testBlacklistDenied 0,001 5,021 0,013 0,27116309 1,10507894

I wonder if some are linked to issue #723.

fbacchella avatar May 26 '20 06:05 fbacchella