jeromq
jeromq copied to clipboard
Tests still flaky
I ran all the tests in a shell loop during many hours.
It ran 120 iterations. And indeed some tests fails, even if it was a dedicated linux. So it was not only a CI problem. I ran them with openjdk 11.0.6 on a Centos 7.7 with 4 CPU, with the arguments:
mvn clean test -Pcrosscompile -Dsurefire.rerunFailingTestsCount=0
I got the following errors:
org.zeromq.auth.ZAuthTest.testBlacklistDenied :
cause: {'test timed out after 5000 milliseconds': 38}
type: {'org.junit.runners.model.TestTimedOutException': 38}
org.zeromq.ZBeaconTest.testUseBuilder :
cause: {None: 1}
type: {'java.lang.NullPointerException': 1}
org.zeromq.auth.ZAuthTest.testWhitelistDenied :
cause: {'test timed out after 5000 milliseconds': 22}
type: {'org.junit.runners.model.TestTimedOutException': 22}
org.zeromq.ZMsgTest.testClosedContext :
cause: {'\nExpected: is <156384765>\n but: was <4>': 3}
type: {'java.lang.AssertionError': 3}
org.zeromq.auth.ZAuthTest.testPlainWithPasswordDenied :
cause: {'test timed out after 5000 milliseconds': 1}
type: {'org.junit.runners.model.TestTimedOutException': 1}
org.zeromq.TestZContext.testSeveralPendingInprocSocketsAreClosedIssue595 :
cause: {None: 1}
type: {'java.lang.NullPointerException': 1}
I also analyzed tests execution times. One test is quite surprising, org.zeromq.TestZPoller.testPollerPollout. Out of 120 ran, 105 ran for less that 0.111. The 15 other ran for about 100 seconds each.
The test zmq.proxy.ProxyTest.testProxy show the same kind of pattern: 65 tests ran for 2s, 55 ran for 42s.
I hope all of this problems are coming from tests themselves and not from Jeromq code.
For the testClosedContext it should check ETERM || EINTR which is what the constant 4 is.
I really don't like the implementation of bindToRandomPort. The Random should be private static final and it's janky. If the underlying ServerSocket is set to 0, a port number will be randomly allocated by the OS typically in the ephemeral range. Re: ZAuthTest
As for bindToRandomPort, I found that binding to tcp://localhost:* and getting the last endpoint using ZMQ.ZMQ_LAST_ENDPOINT to connect the other side was more convenient and less error-prone. Much better than binding to a random port or using Utils.findOpenPort.
is this closable?
There is still a few random failures that shows time to time. But the situation is much better now.
After a night-long run and 233 tests, I get the following results:
org.zeromq.ReqRepTest.testIssue532 :
cause: {'org.zeromq.ZMQException: Errno 4 : Interrupted function': 140, 'test timed out after 5000 milliseconds': 1}
type: {'java.util.concurrent.ExecutionException': 140, 'org.junit.runners.model.TestTimedOutException': 1}
org.zeromq.ReqRepTest.testWaitForeverOnSignalerIssue303 :
cause: {'org.zeromq.ZMQException: Errno 4 : Interrupted function': 114}
type: {'java.util.concurrent.ExecutionException': 114}
org.zeromq.guide.AsyncServerTest.testAsyncServer :
cause: {'test timed out after 10000 milliseconds': 2}
type: {'org.junit.runners.model.TestTimedOutException': 2}
org.zeromq.TestZStar.testNoStar :
cause: {None: 44}
type: {'java.nio.channels.ClosedSelectorException': 44}
org.zeromq.auth.ZAuthTest.testPlainWithPasswordDenied :
cause: {'test timed out after 5000 milliseconds': 6}
type: {'org.junit.runners.model.TestTimedOutException': 6}
org.zeromq.auth.ZAuthTest.testWhitelistDenied :
cause: {'test timed out after 5000 milliseconds': 17}
type: {'org.junit.runners.model.TestTimedOutException': 17}
zmq.proxy.ProxyTest.testProxy :
cause: {'test timed out after 10000 milliseconds': 22}
type: {'org.junit.runners.model.TestTimedOutException': 22}
org.zeromq.auth.ZAuthTest.testBlacklistDenied :
cause: {'test timed out after 5000 milliseconds': 12}
type: {'org.junit.runners.model.TestTimedOutException': 12}
zmq.socket.pubsub.XpubXsubTest.testXpubSub :
cause: {'test timed out after 5000 milliseconds': 2}
type: {'org.junit.runners.model.TestTimedOutException': 2}
org.zeromq.guide.EspressoTest.testEspresso :
cause: {'test timed out after 5000 milliseconds': 3}
type: {'org.junit.runners.model.TestTimedOutException': 3}
org.zeromq.ZBeaconTest.testUseBuilder :
cause: {'expected 500, got 488': 1, 'expected 500, got 490': 1}
type: {'java.lang.AssertionError': 2}
org.zeromq.TestZLoop.testZLoopWithUDP :
cause: {'Address already in use': 1}
type: {'java.net.BindException': 1}
And for test time statistics, the top worst standard deviation are:
| test | min | max | median | average | std |
|---|---|---|---|---|---|
| zmq.proxy.ProxyTest.testProxy | 0,274 | 10,013 | 0,708 | 1,56228326 | 2,72795873 |
| org.zeromq.guide.AsyncServerTest.testAsyncServer | 0,038 | 10,013 | 6,262 | 6,16520172 | 1,63683139 |
| org.zeromq.auth.ZAuthTest.testWhitelistDenied | 0,004 | 5,019 | 0,012 | 0,37696996 | 1,30108449 |
| org.zeromq.ParanoidPiratServerWithLazyPiratClientTest.testIssue408 | 32,027 | 34,754 | 32,043 | 32,849515 | 1,1704725 |
| org.zeromq.auth.ZAuthTest.testBlacklistDenied | 0,001 | 5,021 | 0,013 | 0,27116309 | 1,10507894 |
| org.zeromq.auth.ZAuthTest.testPlainWithPasswordDenied | 0,001 | 5,025 | 0,02 | 0,14916738 | 0,79139267 |
| org.zeromq.guide.EspressoTest.testEspresso | 0,016 | 5,02 | 0,114 | 0,21654936 | 0,55041531 |
| org.zeromq.HighWatermarkTest.testReliabilityOnWatermark2 | 10,052 | 11,314 | 10,193 | 10,3937897 | 0,44287804 |
| zmq.TestTermEndpoint.testUnbindWildcard | 0,002 | 2,01 | 2,004 | 1,93509871 | 0,36432169 |
| org.zeromq.ZFrameTest.testZMQClientServer | 0 | 5,567 | 0,005 | 0,02898712 | 0,36359313 |
And the slowest are:
| test | min | max | median | average | std |
|---|---|---|---|---|---|
| org.zeromq.ParanoidPiratServerWithLazyPiratClientTest.testIssue408 | 32,027 | 34,754 | 32,043 | 32,849515 | 1,1704725 |
| org.zeromq.PubSubTest.testUnsubscribeIssue554 | 19,338 | 19,504 | 19,441 | 19,4437897 | 0,01288288 |
| org.zeromq.HighWatermarkTest.testReliabilityOnWatermark2 | 10,052 | 11,314 | 10,193 | 10,3937897 | 0,44287804 |
| org.zeromq.HighWatermarkTest.testReliabilityOnWatermark | 10,108 | 11,284 | 11,108 | 11,1135579 | 0,14260533 |
| org.zeromq.guide.AsyncServerTest.testAsyncServer | 0,038 | 10,013 | 6,262 | 6,16520172 | 1,63683139 |
| zmq.proxy.ProxyTest.testProxy | 0,274 | 10,013 | 0,708 | 1,56228326 | 2,72795873 |
| org.zeromq.ZFrameTest.testZMQClientServer | 0 | 5,567 | 0,005 | 0,02898712 | 0,36359313 |
| zmq.io.mechanism.SecurityCurveTest.testCurveMechanismSecurity | 4,812 | 5,22 | 4,985 | 4,98849785 | 0,0916192 |
| org.zeromq.auth.ZAuthTest.testPlainWithPasswordDenied | 0,001 | 5,025 | 0,02 | 0,14916738 | 0,79139267 |
| org.zeromq.auth.ZAuthTest.testBlacklistDenied | 0,001 | 5,021 | 0,013 | 0,27116309 | 1,10507894 |
I wonder if some are linked to issue #723.