server
server copied to clipboard
Some tests fail due to test timing issues when using -count=100 / 1000
As per @rkennedy's comments in #24, various tests fail intermittently and typically exhibit when running go test ./... -count=100
. Backtesting indicates these failures occur at least as early as v1.0.5. The failures can occur as few as 1 in every 500 runs. This is likely to be a problem with the tests rather than a problem with the broker code.
Determine the precise cause for these intermittent failures and correct it.
After running go test ./... -count=100
several times, the known failures are as follows:
writing err io: read/write on closed pipe
--- FAIL: TestServerResendClientInflightBackoff (0.01s)
server_test.go:2096:
Error Trace: server_test.go:2096
Error: Not equal:
expected: []byte{0x3a, 0xe, 0x0, 0x5, 0x61, 0x2f, 0x62, 0x2f, 0x63, 0x0, 0xb, 0x68, 0x65, 0x6c, 0x6c, 0x6f}
actual : []byte{}
Diff:
--- Expected
+++ Actual
@@ -1,3 +1,2 @@
-([]uint8) (len=16) {
- 00000000 3a 0e 00 05 61 2f 62 2f 63 00 0b 68 65 6c 6c 6f |:...a/b/c..hello|
+([]uint8) {
}
Test: TestServerResendClientInflightBackoff
writing err io: read/write on closed pipe
--- FAIL: TestServerResendClientInflightBackoff (0.00s)
server_test.go:2105:
Error Trace: server_test.go:2105
Error: Not equal:
expected: 1
actual : 2
Test: TestServerResendClientInflightBackoff
2022/01/30 10:16:33 error writing to buffer io.Writer; io: read/write on closed pipe
--- FAIL: TestServerWriteClient (0.01s)
server_test.go:671:
Error Trace: server_test.go:671
Error: Not equal:
expected: []byte{0x70, 0x2, 0x0, 0xe}
actual : []byte{}
Diff:
--- Expected
+++ Actual
@@ -1,3 +1,2 @@
-([]uint8) (len=4) {
- 00000000 70 02 00 0e |p...|
+([]uint8) {
}
Test: TestServerWriteClient
2022/01/30 10:19:33 error writing to buffer io.Writer; io: read/write on closed pipe
--- FAIL: TestServerCloseClientLWT (0.01s)
server_test.go:1756:
Error Trace: server_test.go:1756
Error: Not equal:
expected: []byte{0x30, 0xc, 0x0, 0x5, 0x61, 0x2f, 0x62, 0x2f, 0x63, 0x68, 0x65, 0x6c, 0x6c, 0x6f}
actual : []byte{}
Diff:
--- Expected
+++ Actual
@@ -1,3 +1,2 @@
-([]uint8) (len=14) {
- 00000000 30 0c 00 05 61 2f 62 2f 63 68 65 6c 6c 6f |0...a/b/chello|
+([]uint8) {
}
Test: TestServerCloseClientLWT
Please add any others to this issue if you see them.
I've seen this a few times:
--- FAIL: TestServerServe (0.00s)
server_test.go:148:
Error Trace: server_test.go:148
Error: Not equal:
expected: true
actual : false
Test: TestServerServe
On my RHEL and macOS systems (both x86_64), I added the -race
option to go test
(not available on ARM32), and I see a few data races, mainly related to unsyncronized access to various fields of the system.Info
struct. Perhaps that's a separate class of errors from the one tracked in this issue, though.
I believe all but one of these are addressed in #36.
On further investigation, there are two categories of test failure that are still happening here after #36. First, there are tests that use time.Sleep()
that may break under heavy load, some of these will be difficult to fix. For example, in #38 a fix for
TestServerWriteClient()
replaced a ReadAll()
that might race with another Close()
with a Read() specifically tailored to the expected 4 bytes (which makes Close uneccessary).
Second, there are tests that read atomic variables without using an atomic
method., as noted above. Easy to fix.
I believe all but one of these are addressed in #36.
Which one do you feel is not addressed? If we can isolate any remaining issues it would be good to split them into a new ticket and close this one.
Sure! I ran overnight testing and was able to reproduce a few test failures. Every failure I looked into, however, just appeared to be a minor issue in the test, not a problem in the code.
I will be glad to file new, specific issues and close this.
Here's the most frequent one:
--- FAIL: TestServerResendClientInflightBackoff (0.00s)
server_test.go:2208:
Error Trace: server_test.go:2208
Error: Not equal:
expected: []byte{0x3a, 0xe, 0x0, 0x5, 0x61, 0x2f, 0x62, 0x2f, 0x63, 0x0, 0xb, 0x68, 0x65, 0x6c, 0x6c, 0x6f}
actual : []byte{0x3a, 0xe, 0x0, 0x5, 0x61, 0x2f, 0x62, 0x2f, 0x63, 0x0, 0xb, 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x3a, 0xe, 0x0, 0x5, 0x61, 0x2f, 0x62, 0x2f, 0x63, 0x0, 0xb, 0x68, 0x65, 0x6c, 0x6c, 0x6f}
Diff:
--- Expected
+++ Actual
@@ -1,3 +1,4 @@
-([]uint8) (len=16) {
+([]uint8) (len=32) {
00000000 3a 0e 00 05 61 2f 62 2f 63 00 0b 68 65 6c 6c 6f |:...a/b/c..hello|
+ 00000010 3a 0e 00 05 61 2f 62 2f 63 00 0b 68 65 6c 6c 6f |:...a/b/c..hello|
}
Test: TestServerResendClientInflightBackoff
FAIL
FAIL github.com/mochi-co/mqtt/server 334.052s
FAIL
If you repeat this test enough, the Resend backoff test will fail, resulting in two packets being sent. The test uses second-precision and a sleep of 1ms, so there's a small probability of sleeping so that time.Unix()
shows a difference of 1, so that the first resend happens.
We can rename this issue to capture correcting all "flaky" tests related to rare timing events and keep working on them. In any case, I run go test -race -count=1000 -timeout=100h .
and they're popping up pretty frequently.
This appears fixed in v1.2.0
but I'll leave it open just incase.
I am still seeing the TestServerResendClientInflightBackoff issue
Fixed in v1.3.0 due to a timing issue between writing and reading the err message from the close err atomic.