go
go copied to clipboard
net/http: TestTransportMaxPerHostIdleConns failures
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
Issue created automatically to collect these failures.
Example (log):
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.52s)
transport_test.go:552: after second response, idle conns = 1; want 2
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2022-12-21 20:18 freebsd-arm-paulzhol go@fadd77c0 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.52s)
transport_test.go:552: after second response, idle conns = 1; want 2
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2022-12-01 22:04 freebsd-arm-paulzhol go@6a70292d net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.81s)
transport_test.go:552: after second response, idle conns = 1; want 2
2022-12-02 16:29 freebsd-arm-paulzhol go@1711f953 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.08s)
transport_test.go:552: after second response, idle conns = 1; want 2
Only observed on freebsd-arm-paulzhol so far, and three times.
@paulzhol, any ideas?
(CC @neild)
I think it is related to the TestTransportMaxIdleConns failures. I tried looking at some of the log outputs here: https://github.com/golang/go/issues/56960#issuecomment-1344245556.
Couldn't find anything concrete, but I had a hunch that concurrent tests (maybe scheduled like they do on the slow builder) could be re-using ports (maybe even properly closed sockets but which are still in TIME_WAIT state in the kernel), so connecting to them with ends up with an RST reply.
In these failures, the observed number of idle connections always seems to undershoot the expected number.
That leads me to suspect that the idle connections are being returned asynchronously, and for some reason they're a little too slow on this builder to be visible in the pool.
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-01-10 17:59 freebsd-arm-paulzhol go@82f09b75 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.32s)
transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-01-19 20:46 darwin-amd64-13 go@24a9d7bc net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.08s)
transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
2023-01-23 15:51 freebsd-arm-paulzhol go@e22bd234 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.06s)
transport_test.go:552: after second response, idle conns = 1; want 2
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-01-27 21:59 freebsd-arm-paulzhol go@b15297fc net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.18s)
transport_test.go:552: after second response, idle conns = 1; want 2
1 second passes in backend, proxygone= false
2023-01-31 13:55 freebsd-arm-paulzhol go@da0c375c net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.22s)
transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
2023-02-01 19:47 linux-amd64-race go@ab0f0459 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-02-06 20:39 darwin-amd64-11_0 go@103f3749 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.08s)
transport_test.go:552: after second response, idle conns = 1; want 2
2023-02-10 18:59 darwin-amd64-13 go@ae2f1201 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.07s)
transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-03-09 20:32 freebsd-arm-paulzhol go@7042ea62 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.11s)
transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-03-14 03:09 darwin-amd64-13 go@8b3dc539 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.08s)
transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
I had a hunch that concurrent tests (maybe scheduled like they do on the slow builder) could be re-using ports (maybe even properly closed sockets but which are still in
TIME_WAITstate in the kernel)
If so, that's arguably a kernel bug: since these are loopback connections, the kernel shouldn't allow the port to be reused until it is done sending RST packets to that port. (And it has enough information to know that!)
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-03-17 04:44 freebsd-arm-paulzhol go@3360be4a net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.68s)
transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
I had a hunch that concurrent tests (maybe scheduled like they do on the slow builder) could be re-using ports (maybe even properly closed sockets but which are still in
TIME_WAITstate in the kernel)If so, that's arguably a kernel bug: since these are loopback connections, the kernel shouldn't allow the port to be reused until it is done sending
RSTpackets to that port. (And it has enough information to know that!)
But we're always setting SO_REUSEADDR in setDefaultListenerSockopts() for TCP Listeners. This allows them to rebind to a still TIME_WAIT port.
The test will then try to Dial a new connection to that port, which the kernel will drop with RST if the 5 tuple matches.
The source pairs don't look like they are coming from the randomized ephemeral range at all in these logs:
https://github.com/golang/go/issues/56960#issuecomment-1344245556
The test will then try to Dial a new connection to that port, which the kernel will drop with RST if the 5 tuple matches.
The test may be assuming RFC 6191 semantics, which should cause the kernel to accept all incoming self-connections regardless of TIME-WAIT (since the kernel can always arrange for its own timestamps and/or sequence numbers to satisfy the requirements for accepting a SYN segment on a connection in the TIME-WAIT state).
Does the FreeBSD network stack implement RFC 6191 semantics?
The test may be assuming RFC 6191 semantics, which should cause the kernel to accept all incoming self-connections regardless of TIME-WAIT (since the kernel can always arrange for its own timestamps and/or sequence numbers to satisfy the requirements for accepting a
SYNsegment on a connection in theTIME-WAITstate).Does the FreeBSD network stack implement RFC 6191 semantics?
I can't tell for sure, it looks disabled?
If we are looking for something along the lines of tcp_tw_reuse and tcp_tw_recycle (disabled after kernel v4.11) sysctls in Linux, I think we have nolocaltimewait in FreeBSD:
Suppress creating of compressed TCP TIME_WAIT states for connections in which both endpoints are local.
Alternatively I can try to disable syncookies which are supposed to interfere with the basic PAWS timestamps mechanism:
Syncookies have a certain number of disadvantages that a paranoid admin- istrator may wish to take note of. Since the TCP options from the ini- tial SYN are not saved, they are not applied to the connection, preclud- ing use of features like window scale, timestamps, or exact MSS sizing.
I've started running the builder with net.inet.tcp.nolocaltimewait=1 since 2023-04-01 16:53:54.926870786 +0000 UTC.
Change https://go.dev/cl/483116 mentions this issue: net/http: wait forever for write results in tests
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-04-07 11:34 freebsd-arm-paulzhol go@949fdd9f net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
--- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.48s)
transport_test.go:554: after second response, idle conns = 1; want 2
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
Found new dashboard test flakes for:
#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2024-03-21 22:14 linux-s390x-ibm-race go@4f0408a3 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
testing.go:1413: race detected during execution of test