go icon indicating copy to clipboard operation
go copied to clipboard

net: TestFileCloseRace failures with `connection reset by peer`

Open gopherbot opened this issue 2 years ago • 2 comments

#!watchflakes
post <- pkg == "net" && test == "TestFileCloseRace" && `connection reset by peer`

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestFileCloseRace (0.00s)
    file_test.go:321: dial tcp 127.0.0.1:65102: connect: connection reset by peer

watchflakes

gopherbot avatar Feb 02 '23 20:02 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net" && test == "TestFileCloseRace"
2023-02-02 19:39 netbsd-amd64-9_3 go@fcd0e096 net.TestFileCloseRace (log)
--- FAIL: TestFileCloseRace (0.00s)
    file_test.go:321: dial tcp 127.0.0.1:65102: connect: connection reset by peer

watchflakes

gopherbot avatar Feb 02 '23 20:02 gopherbot

@golang/netbsd, this is the first we've seen of this failure mode, and it looks to me like a bug somewhere in the (Go or kernel) network stack.

The handler reads a byte from the connection before closing it, so I don't see why Dial should ever fail with ECONNRESET. (https://cs.opensource.google/go/go/+/master:src/net/file_test.go;l=306-308;drc=b55cbbb9e76969d67fbc6e264a584ad18c2f95fa)

bcmills avatar Feb 03 '23 14:02 bcmills

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net" && test == "TestFileCloseRace" && `connection reset by peer`
2023-03-24 02:45 netbsd-amd64-9_3 go@33b63480 net.TestFileCloseRace (log)
--- FAIL: TestFileCloseRace (0.01s)
    file_test.go:321: dial tcp 127.0.0.1:64962: connect: connection reset by peer

watchflakes

gopherbot avatar Mar 24 '23 03:03 gopherbot

@golang/netbsd, this still looks like a kernel bug to me, and I've found at least two other reports of it (#60743 and #52168).

bcmills avatar Jun 13 '23 14:06 bcmills

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-05-02 02:52 netbsd-386-9_3 vuln@76abf843 go@c3c53a2c x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.55s)
    --- FAIL: TestByModules/all (0.13s)
        --- FAIL: TestByModules/all/http (0.00s)
            client_test.go:277: Get "http://127.0.0.1:65467/index/modules.json.gz": dial tcp 127.0.0.1:65467: connect: connection reset by peer
2023-05-05 20:14 netbsd-amd64-9_3 vuln@d573de52 go@f3034bed x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.30s)
    --- FAIL: TestByModules/all (0.12s)
        --- FAIL: TestByModules/all/http (0.00s)
            client_test.go:277: Get "http://127.0.0.1:65468/index/modules.json.gz": dial tcp 127.0.0.1:65468: connect: connection reset by peer
2023-06-01 14:35 netbsd-amd64-9_3 net@10cf3880 go@db3f952b x/net/ipv6.TestConnInitiatorPathMTU (log)
--- FAIL: TestConnInitiatorPathMTU (0.00s)
    sockopt_test.go:40: dial tcp6 [::1]:65509: connect: connection reset by peer
    asm_amd64.s:1650: accept tcp6 [::1]:65509: use of closed network connection
2023-06-08 16:11 netbsd-386-9_3 net@5541298b go@1db23771 x/net/ipv6.TestConnUnicastSocketOptions (log)
--- FAIL: TestConnUnicastSocketOptions (0.00s)
    unicastsockopt_test.go:44: dial tcp6 [::1]:65504: connect: connection reset by peer

watchflakes

gopherbot avatar Jun 13 '23 14:06 gopherbot

Is there a reproducer for this? Does ktrace confirm the kernel is actually returning ECONNRESET?

riastradh avatar Jun 13 '23 18:06 riastradh

The failure mode appears to be intermittent, and nobody has done the work to find a more reliable reproducer. (I would generally consider that to be part of the port maintainer's role.)

We don't have any way to get ktrace logs from the builders for a given run.

bcmills avatar Jun 13 '23 19:06 bcmills

Can you remind me how to run just this test, at least, starting from a clean go source tree and an existing go binary?

riastradh avatar Jun 13 '23 19:06 riastradh

Starting from a clean source tree, you'd do something like:

$ cd src
src$ ./make.bash
…
src$ ../bin/go test net -run=TestFileCloseRace

That said, I suspect that the failure may depend on load (generally lots of these tests are running concurrently, so maybe it takes just the right combination of concurrent system calls, or just the right timing between closing one server on the port and opening another one).

bcmills avatar Jun 13 '23 19:06 bcmills

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-06-29 19:38 netbsd-386-9_3 vuln@2f47882f go@c4590af1 x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.67s)
    --- FAIL: TestByModules/all (0.30s)
        --- FAIL: TestByModules/all/http (0.02s)
            client_test.go:287: Get "http://127.0.0.1:65474/ID/GO-2021-0068.json.gz": dial tcp 127.0.0.1:65474: connect: connection reset by peer

watchflakes

gopherbot avatar Jun 30 '23 17:06 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-06-30 20:27 netbsd-amd64-9_3 vuln@a443f34b go@18e17e2c x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.47s)
    --- FAIL: TestByModules/all (0.19s)
        --- FAIL: TestByModules/all/http (0.01s)
            client_test.go:287: Get "http://127.0.0.1:65469/ID/GO-2022-0569.json.gz": dial tcp 127.0.0.1:65469: connect: connection reset by peer

watchflakes

gopherbot avatar Jun 30 '23 20:06 gopherbot

Starting from a clean source tree, you'd do something like:

$ cd src
src$ ./make.bash
…
src$ ../bin/go test net -run=TestFileCloseRace

When I run this it prints:

ok      net     (cached) [no tests to run]

Is that to be expected? Is it actually running the test? Do I have to redo make.bash each time I want to rerun the test?

riastradh avatar Jul 02 '23 17:07 riastradh

Typo, had written FileTestCloseRace, instead of copying & pasting TestFileCloseRace, sorry for the noise.

riastradh avatar Jul 02 '23 18:07 riastradh

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-06-30 20:27 netbsd-386-9_3 vuln@a443f34b go@e126572f x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.63s)
    --- FAIL: TestByModules/all (0.29s)
        --- FAIL: TestByModules/all/http (0.03s)
            client_test.go:287: Get "http://127.0.0.1:65472/ID/GO-2021-0240.json.gz": dial tcp 127.0.0.1:65472: connect: connection reset by peer

watchflakes

gopherbot avatar Jul 05 '23 15:07 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-08-16 01:59 netbsd-amd64-9_3 review@fbc0ad72 go@ce160863 x/review/git-codereview.TestPendingGerritMultiChange15 (log)
git-codereview: fetch /a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Atmp%2Fworkdir%2Ftmp%2Fgit-codereview-test1227261198%2Fgit-origin~main~I123456789: Get "/a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Atmp%2Fworkdir%2Ftmp%2Fgit-codereview-test1227261198%2Fgit-origin~main~I123456789": unsupported protocol scheme ""
git-codereview: fetch /a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Atmp%2Fworkdir%2Ftmp%2Fgit-codereview-test1227261198%2Fgit-origin~main~I123456789: Get "/a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Atmp%2Fworkdir%2Ftmp%2Fgit-codereview-test1227261198%2Fgit-origin~main~I123456789": unsupported protocol scheme ""
git-codereview: fetch /a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Atmp%2Fworkdir%2Ftmp%2Fgit-codereview-test1227261198%2Fgit-origin~main~I123456789: Get "/a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Atmp%2Fworkdir%2Ftmp%2Fgit-codereview-test1227261198%2Fgit-origin~main~I123456789": unsupported protocol scheme ""
git-codereview: fetch /a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Atmp%2Fworkdir%2Ftmp%2Fgit-codereview-test1227261198%2Fgit-origin~main~I123456789: Get "/a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Atmp%2Fworkdir%2Ftmp%2Fgit-codereview-test1227261198%2Fgit-origin~main~I123456789": unsupported protocol scheme ""
git-codereview: fetch /a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Aproj~main~I0000000004&q=change%3Aproj~main~I0000000003&q=change%3Aproj~main~I0000000002&q=change%3Aproj~main~I0000000001&q=change%3Aproj~main~I123456789: Get "http://127.0.0.1:65520/a/changes/?o=DETAILED_LABELS&o=CURRENT_REVISION&o=MESSAGES&o=DETAILED_ACCOUNTS&q=change%3Aproj~main~I0000000004&q=change%3Aproj~main~I0000000003&q=change%3Aproj~main~I0000000002&q=change%3Aproj~main~I0000000001&q=change%3Aproj~main~I123456789": dial tcp 127.0.0.1:65520: connect: connection reset by peer
--- FAIL: TestPendingGerritMultiChange15 (0.54s)
    pending_test.go:396: git-codereview pending -s
    pending_test.go:396: stdout:
        work d83eb29..7ec9bff (current branch)
        + 7ec9bff v14
...
        ++ REVHASH msg

git-codereview: fetch /a/changes/proj~main~I123456789?o=LABELS&o=CURRENT_REVISION: change not found on Gerrit server
Gerrit response:
404 page not found

git-codereview: fetch /a/changes/proj~main~I123456789?o=LABELS&o=CURRENT_REVISION: http://127.0.0.1:65517/a/changes/proj~main~I123456789?o=LABELS&o=CURRENT_REVISION: malformed json response
Gerrit response:
)]}'
XXX

watchflakes

gopherbot avatar Aug 18 '23 21:08 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-08-17 17:30 netbsd-386-9_3 vuln@da4b74a5 go@70aa116c x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.60s)
    --- FAIL: TestByModules/all (0.28s)
        --- FAIL: TestByModules/all/http (0.03s)
            client_test.go:287: Get "http://127.0.0.1:65473/ID/GO-2022-0475.json.gz": dial tcp 127.0.0.1:65473: connect: connection reset by peer

watchflakes

gopherbot avatar Aug 21 '23 09:08 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-08-30 19:53 netbsd-amd64-9_3 go@dffc47e7 net.TestDialTimeout (log)
--- FAIL: TestDialTimeout (0.03s)
    --- FAIL: TestDialTimeout/1ms/5s (0.00s)
        timeout_test.go:127: Dial returned after 145.796µs; want ≥1ms
        timeout_test.go:135: Dial: dial tcp 127.0.0.1:64831: connect: connection reset by peer, want timeout
    timeout_test.go:56: ignored 47 spurious connections

watchflakes

gopherbot avatar Sep 08 '23 17:09 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-09-14 17:21 netbsd-amd64-9_3 vuln@3881ca88 go@e356aa65 x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.69s)
    --- FAIL: TestByModules/all (0.26s)
        --- FAIL: TestByModules/all/http (0.00s)
            client_test.go:287: Get "http://127.0.0.1:65476/ID/GO-2022-0572.json.gz": dial tcp 127.0.0.1:65476: connect: connection reset by peer

watchflakes

gopherbot avatar Sep 14 '23 21:09 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-10-02 19:08 netbsd-386-9_3 vuln@6987ccb7 go@b201cf4a x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.62s)
    --- FAIL: TestByModules/all (0.20s)
        --- FAIL: TestByModules/all/http (0.01s)
            client_test.go:287: Get "http://127.0.0.1:65473/ID/GO-2022-0475.json.gz": dial tcp 127.0.0.1:65473: connect: connection reset by peer

watchflakes

gopherbot avatar Oct 10 '23 00:10 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-10-24 20:26 netbsd-amd64-9_3 go@9ab51216 net/http.TestHeadBody (log)
--- FAIL: TestHeadBody (0.00s)
    --- FAIL: TestHeadBody/h1 (0.00s)
        --- FAIL: TestHeadBody/h1/identity (0.00s)
            serve_test.go:6931: Head "http://127.0.0.1:55017": dial tcp 127.0.0.1:55017: connect: connection reset by peer

watchflakes

gopherbot avatar Oct 25 '23 15:10 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-10-18 15:02 netbsd-amd64-9_3 vuln@33c23e23 go@884c93a6 x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.57s)
    --- FAIL: TestByModules/all (0.21s)
        --- FAIL: TestByModules/all/http (0.02s)
            client_test.go:287: Get "http://127.0.0.1:65473/ID/GO-2021-0068.json.gz": dial tcp 127.0.0.1:65473: connect: connection reset by peer

watchflakes

gopherbot avatar Oct 25 '23 18:10 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-10-30 20:54 netbsd-386-9_3 vuln@0f3617a9 go@998fdce3 x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.47s)
    --- FAIL: TestByModules/all (0.17s)
        --- FAIL: TestByModules/all/http (0.00s)
            client_test.go:287: Get "http://127.0.0.1:65471/ID/GO-2022-0569.json.gz": dial tcp 127.0.0.1:65471: connect: connection reset by peer

watchflakes

gopherbot avatar Oct 30 '23 22:10 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-11-21 20:23 netbsd-amd64-9_3 vuln@286bb05c go@539b4d8d x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.44s)
    --- FAIL: TestByModules/all (0.17s)
        --- FAIL: TestByModules/all/http (0.00s)
            client_test.go:287: Get "http://127.0.0.1:65472/ID/GO-2021-0068.json.gz": dial tcp 127.0.0.1:65472: connect: connection reset by peer

watchflakes

gopherbot avatar Nov 21 '23 21:11 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-11-21 20:23 netbsd-amd64-9_3 vuln@286bb05c go@e4ea6283 x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.48s)
    --- FAIL: TestByModules/all (0.13s)
        --- FAIL: TestByModules/all/http (0.01s)
            client_test.go:287: Get "http://127.0.0.1:65471/ID/GO-2021-0068.json.gz": dial tcp 127.0.0.1:65471: connect: connection reset by peer

watchflakes

gopherbot avatar Nov 22 '23 17:11 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-12-05 20:09 netbsd-amd64-9_3 vuln@c27f855e go@bb34112d x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.56s)
    --- FAIL: TestByModules/all (0.21s)
        --- FAIL: TestByModules/all/http (0.00s)
            client_test.go:287: Get "http://127.0.0.1:65474/ID/GO-2022-0229.json.gz": dial tcp 127.0.0.1:65474: connect: connection reset by peer
2023-12-06 17:59 netbsd-amd64-9_3 go@3f2bf706 net/http.TestContentEncodingNoSniffing (log)
--- FAIL: TestContentEncodingNoSniffing (0.00s)
    --- FAIL: TestContentEncodingNoSniffing/h2 (0.05s)
        --- FAIL: TestContentEncodingNoSniffing/h2/no_content-encoding (0.00s)
            serve_test.go:6315: Failed to fetch URL: Get "https://127.0.0.1:64314": dial tcp 127.0.0.1:64314: connect: connection reset by peer

watchflakes

gopherbot avatar Dec 08 '23 20:12 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2023-12-20 04:11 netbsd-386-9_3 vuln@73356279 go@ee4b61a3 x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.48s)
    --- FAIL: TestByModules/all (0.17s)
        --- FAIL: TestByModules/all/http (0.00s)
            client_test.go:287: Get "http://127.0.0.1:65472/ID/GO-2021-0264.json.gz": dial tcp 127.0.0.1:65472: connect: connection reset by peer

watchflakes

gopherbot avatar Jan 11 '24 17:01 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2024-01-12 13:27 netbsd-amd64-9_3 vuln@d8d123b8 go@e58e8139 x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.57s)
    --- FAIL: TestByModules/all (0.14s)
        --- FAIL: TestByModules/all/http (0.00s)
            client_test.go:287: Get "http://127.0.0.1:65473/ID/GO-2022-0572.json.gz": dial tcp 127.0.0.1:65473: connect: connection reset by peer

watchflakes

gopherbot avatar Jan 13 '24 11:01 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2024-01-22 17:47 netbsd-386-9_3 vuln@8fb35e0f go@9368ced7 x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.40s)
    --- FAIL: TestByModules/all (0.15s)
        --- FAIL: TestByModules/all/http (0.00s)
            client_test.go:287: Get "http://127.0.0.1:65472/ID/GO-2022-0569.json.gz": dial tcp 127.0.0.1:65472: connect: connection reset by peer

watchflakes

gopherbot avatar Jan 23 '24 20:01 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2024-01-25 15:53 netbsd-386-9_3 go@67d30fc3 net/http.TestH12_RequestContentLength_Known_Zero (log)
--- FAIL: TestH12_RequestContentLength_Known_Zero (0.00s)
    clientserver_test.go:318: HTTP/2 request: Post "https://127.0.0.1:64609": dial tcp 127.0.0.1:64609: connect: connection reset by peer

watchflakes

gopherbot avatar Jan 25 '24 20:01 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "netbsd" && `connect: connection reset by peer`
2024-02-06 19:26 netbsd-386-9_3 vuln@55070634 go@1400b268 x/vuln/internal/client.TestByModules (log)
--- FAIL: TestByModules (0.70s)
    --- FAIL: TestByModules/all (0.22s)
        --- FAIL: TestByModules/all/http (0.01s)
            client_test.go:287: Get "http://127.0.0.1:65473/ID/GO-2022-0569.json.gz": dial tcp 127.0.0.1:65473: connect: connection reset by peer

watchflakes

gopherbot avatar Feb 08 '24 16:02 gopherbot