go icon indicating copy to clipboard operation
go copied to clipboard

net: Dial does not respond to quickly-broken IPv6 connections by falling back to IPv4

Open oakad opened this issue 1 year ago • 17 comments

Go version

go version go1.22.4 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOOS='darwin'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.4/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.4/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.4'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/q9/qcgtwgsj0y72gr01_djqgmyw0000gq/T/go-build1826860007=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Trying to fetch a random module (all break the same):

% go get nhooyr.io/websocket go package net: confVal.netCgo = false netGo = false go package net: using cgo DNS resolver go package net: hostLookupOrder(proxy.golang.org) = cgo go: module nhooyr.io/websocket: Get "https://proxy.golang.org/nhooyr.io/websocket/@v/list": write tcp [fe80::bed0:74ff:fe64:598e%utun4]:56330->[2a00:1450:4003:80c::2011]:443: write: socket is not connected

Machine has IPv6 disabled:

% dig proxy.golang.org ; <<>> DiG 9.10.6 <<>> proxy.golang.org ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53713 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4000 ;; QUESTION SECTION: ;proxy.golang.org. IN A

;; ANSWER SECTION: proxy.golang.org. 46 IN A 142.250.184.177

;; Query time: 366 msec ;; SERVER: 10.20.141.5#53(10.20.141.5) ;; WHEN: Fri Jun 28 21:47:33 AEST 2024 ;; MSG SIZE rcvd: 61

What did you see happen?

Go get is unable to fetch a module because it's using a wrong proxy address.

What did you expect to see?

Go get should be able to fetch a module.

oakad avatar Jun 28 '24 11:06 oakad

That's not proof that IPv6 is disabled, only that dig defaults to an A (IPv4) query.

seankhliao avatar Jun 28 '24 12:06 seankhliao

It is, I assure you. However, there's a caveat: we have a Cisco VPN which insists on advertising an additional resolver; the said resolver is able to resolve AAAA record ("Request A records, Request AAAA records"). Basically, I've got this config:

DNS configuration

resolver #1 search domain[0] : heh nameserver[0] : heh nameserver[1] : heh flags : Request A records, Request AAAA records reach : 0x00000002 (Reachable) order : 1

DNS configuration (for scoped queries) resolver #1 nameserver[0] : heh nameserver[1] : heh if_index : 15 (en0) flags : Scoped, Request A records reach : 0x00000002 (Reachable)

resolver #2 search domain[0] : heh nameserver[0] : heh nameserver[1] : heh if_index : 23 (utun4) flags : Scoped, Request A records, Request AAAA records reach : 0x00000002 (Reachable) order : 1

Still, go should not pick the AAAA address. Or, at least, it should not do so unconditionally, because I don't think our setup is uniquely broken. :-)

oakad avatar Jun 28 '24 12:06 oakad

From the output it is clear that the cgo resolver is being used, so out of our scope.

mateusz834 avatar Jun 28 '24 12:06 mateusz834

https://danp.net/posts/macos-dns-change-in-go-1-20/

This had started happening relatively recently and I believe it is caused by changes above.

oakad avatar Jun 28 '24 12:06 oakad

Can you try forcing the go resolver and see if it helps in your case? GODEBUG=netdns=go

mateusz834 avatar Jun 28 '24 12:06 mateusz834

How do I enable both this feature and dns debug so we can see it is used for real?

oakad avatar Jun 28 '24 12:06 oakad

GODEBUG=netdns=go+2

mateusz834 avatar Jun 28 '24 12:06 mateusz834

Tough luck:

% go get nhooyr.io/websocket go package net: confVal.netCgo = false netGo = true go package net: GODEBUG setting forcing use of Go's resolver go package net: hostLookupOrder(proxy.golang.org) = files,dns go: module nhooyr.io/websocket: Get "https://proxy.golang.org/nhooyr.io/websocket/@v/list": write tcp [fe80::bed0:74ff:fe64:598e%utun4]:57052->[2a00:1450:4003:80c::2011]:443: write: socket is not connected

oakad avatar Jun 28 '24 12:06 oakad

For reference, curl does this:

% curl -v https://proxy.golang.org/nhooyr.io/websocket/@v/list

  • Host proxy.golang.org:443 was resolved.
  • IPv6: 2a00:1450:4003:80c::2011
  • IPv4: 142.250.184.177
  • Trying 142.250.184.177:443...
  • Trying [2a00:1450:4003:80c::2011]:443...
  • Connected to proxy.golang.org (142.250.184.177) port 443
  • ALPN: curl offers h2,http/1.1

oakad avatar Jun 28 '24 12:06 oakad

What if you pass --ipv6 to curl?

In theory go's network stack should also be doing fast fallback / dual stack ipv4 and ipv6

seankhliao avatar Jun 28 '24 12:06 seankhliao

So the tittle is incorrect, it resolves correctly, but it fails to connect to the server when ipv6 is unavail, right?

mateusz834 avatar Jun 28 '24 13:06 mateusz834

curl gets stuck when forced to use ipv6. It may be that despite underlying adapter has ipv6 disabled, the Cisco vpn client pretends it's got an ipv6 address on the utun interface. Yet it causes no issues anywhere, everything works fine apart from go.

% curl -v --ipv6 https://proxy.golang.org/nhooyr.io/websocket/@v/list

  • Host proxy.golang.org:443 was resolved.
  • IPv6: 2a00:1450:4003:80c::2011
  • IPv4: (none)
  • Trying [2a00:1450:4003:80c::2011]:443... ... waits for timeout

oakad avatar Jun 28 '24 16:06 oakad

The address is of course correct, it's the action of resolving the AAAA and sticking to it rather than resolving A is incorrect. :-)

oakad avatar Jun 28 '24 16:06 oakad

From the discussion so far, it sounds like:

  1. Your Mac is configured with IPv6 enabled (that is, IPv6 sockets can be created successfully).
  2. Your DNS resolver is responding to AAAA requests with IPv6 addresses.
  3. Go looks up proxy.golang.org and gets both IPv6 and IPv4 addresses.
  4. Go connects to one of the IPv6 addresses seemingly successfully. Specifically, it does the connect and then runs getsockopt(fd, SOL_SOCKET, SO_ERROR) in net/fd_unix.go and gets syscall.EISCONN, which makes it return from Dial.
  5. A future write on that connection gets syscall.ENOTCONN, as shown in the error messages.

Normally, when IPv6 addresses can't be used, the connect never succeeds (fails or times out). In your case, it appears that the connect is succeeding but then the connection breaks very quickly after that, perhaps on the first write.

Do you know of anything strange about your Mac's network or IPv6 configuration? Or some firewall that is actively breaking IPv6 connections?

For example on my Mac:

% host proxy.golang.org
proxy.golang.org has address 142.250.65.177
proxy.golang.org has IPv6 address 2607:f8b0:4006:80e::2011
proxy.golang.org mail is handled by 40 alt4.gmr-smtp-in.l.google.com.
proxy.golang.org mail is handled by 10 alt1.gmr-smtp-in.l.google.com.
proxy.golang.org mail is handled by 5 gmr-smtp-in.l.google.com.
proxy.golang.org mail is handled by 30 alt3.gmr-smtp-in.l.google.com.
proxy.golang.org mail is handled by 20 alt2.gmr-smtp-in.l.google.com.
% sudo route add -inet6 2607:f8b0:4006:80e::2011 ::1
add host 2607:f8b0:4006:80e::2011: gateway ::1
% go mod download -json rsc.io/markdown@latest
{
	"Path": "rsc.io/markdown",
	"Version": "v0.0.0-20240617154923-1f2ef1438fed",
	"Query": "latest",
	"Info": "/Users/rsc/pkg/mod/cache/download/rsc.io/markdown/@v/v0.0.0-20240617154923-1f2ef1438fed.info",
	"GoMod": "/Users/rsc/pkg/mod/cache/download/rsc.io/markdown/@v/v0.0.0-20240617154923-1f2ef1438fed.mod",
	"Zip": "/Users/rsc/pkg/mod/cache/download/rsc.io/markdown/@v/v0.0.0-20240617154923-1f2ef1438fed.zip",
	"Dir": "/Users/rsc/pkg/mod/rsc.io/[email protected]",
	"Sum": "h1:savaUwUp0YCIxdaF9EFOMB3j+TQnoLop+cNp2KPC9jk=",
	"GoModSum": "h1:rzOcjAz36Xzvwf6iaJSYXkmNbvu5XHelis1egIN0Cys="
}
% curl -v --ipv6 https://proxy.golang.org
* Host proxy.golang.org:443 was resolved.
* IPv6: 2607:f8b0:4006:80e::2011
* IPv4: (none)
*   Trying [2607:f8b0:4006:80e::2011]:443...
^C
% sudo route delete -inet6 2607:f8b0:4006:80e::2011 
delete host 2607:f8b0:4006:80e::2011
% curl -v --ipv6 https://proxy.golang.org
* Host proxy.golang.org:443 was resolved.
* IPv6: 2607:f8b0:4006:80e::2011
* IPv4: (none)
*   Trying [2607:f8b0:4006:80e::2011]:443...
* Immediate connect fail for 2607:f8b0:4006:80e::2011: No route to host
* Failed to connect to proxy.golang.org port 443 after 3 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to proxy.golang.org port 443 after 3 ms: Couldn't connect to server
% 

rsc avatar Jun 28 '24 19:06 rsc

The problem only happens with VPN enabled, I mentioned it before. The VPN in question is Cisco secure client, aka AnyConnect. I'm working with people who manage the Cisco VPN for us to see if they can change anything on their side (AnyConnect is supposed to be server side controlled, so not much can be done on the client side).

  1. Only Go breaks on our current setup; all other applications seem to work just fine. Go used to work previously, it only started breaking relatively recently (may be caused by 1.20 changes or by some changes to AnyConnect setup).
  2. Go can be made to work by using ifconfig to erase IPv6 addresses from the utun device in use by AnyConnect. This, however, has to be done on any VPN reconnection (due to how AnyConnect works).

oakad avatar Jun 30 '24 18:06 oakad

@rsc I get the same issue when trying to install things using 1.22.6 on my MacBook while on our corporate VPN (which is also Cisco AnyConnect).

My testing reveals there are two underlying issues:

  1. The IPv4 dial (which, contrary to the documentation, actually happens first, see #68795) takes longer than 300 milliseconds to complete.
  2. Somehow Go believes the IPv6 dial works even though it clearly didn't really. (Kernel bug?)

Increasing the dialer's FallbackDelay (or making it negative) is enough to resolve the issue, but I have no control over what go install is doing. Would it be possible to allow overriding the 300 ms default via some env var?

rittneje avatar Aug 25 '24 23:08 rittneje