ndt-server
ndt-server copied to clipboard
Test_DrainForeverButMeasureFor_NormalOperation failure
2020/02/11 21:04:42 c2s.go:165: Timed out
--- FAIL: Test_DrainForeverButMeasureFor_NormalOperation (0.11s)
c2s_test.go:56: Expected positive byte count but got 0
See https://travis-ci.org/m-lab/ndt-server/builds/649118098#L3475
Yeah, this happened to me yesterday too.
2020/02/24 22:29:52 c2s.go:165: Timed out
--- FAIL: Test_DrainForeverButMeasureFor_NormalOperation (0.14s)
c2s_test.go:56: Expected positive byte count but got 0
FAIL
Flaky test. :(
This appears to be caused by a genuine race. The TCPInfo
data returned contains non-zero metrics for other fields, but the test only checks BytesReceived
, which happens to be zero after the provided timeouts expire (some of the time). This is why err
is nil
in these cases.
The web100 "measurer" polls every 100ms https://github.com/m-lab/ndt-server/blob/master/ndt5/web100/web100_linux.go#L55
The c2s "drain forever" unit tests variously provide timeouts of 100ms, 1sec, 1ms.
This bug is reporting the "NormalOperation" variant which uses 100ms. We may see this more often merely b/c it runs first.
Anecdotally, I've also seen an error from the "WsGoodput" variant which uses 1ms while working to migrate to the netx package.
PR #297 includes improvements to the unit tests by adjusting some of these timeouts.
Originally, I could trigger the test error simply by running this loop for a few minutes:
while go test -v -count=5 -timeout 30s github.com/m-lab/ndt-server/ndt5/c2s ; do sleep .1 ; done
After adjusting the timeouts, it ran over night (~10hrs) without error. While this does not fundamentally fix the race, I think it improves the situation.