go icon indicating copy to clipboard operation
go copied to clipboard

net: Resolver doesn't use provided Dial function in all cases

Open chriso opened this issue 2 years ago • 6 comments

What version of Go are you using (go version)?

$ go version
go version go1.20.3 linux/arm64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/tmp/go"
GOENV="/home/ubuntu/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/usr/local/lib/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/usr/local/lib/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.20"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.20/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="go1.20.3"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1714904807=/tmp/go-build -gno-record-gcc-switches"

What did you do?

The net.Resolver accepts an optional Dial function that says the following:

type Resolver struct {
	// Dial optionally specifies an alternate dialer for use by
	// Go's built-in DNS resolver to make TCP and UDP connections
	// to DNS services. The host in the address parameter will
	// always be a literal IP address and not a host name, and the
	// port in the address parameter will be a literal port number
	// and not a service name.
	// If the Conn returned is also a PacketConn, sent and received DNS
	// messages must adhere to RFC 1035 section 4.2.1, "UDP usage".
	// Otherwise, DNS messages transmitted over Conn must adhere
	// to RFC 7766 section 5, "Transport Protocol Selection".
	// If nil, the default dialer is used.
	Dial func(ctx context.Context, network, address string) (Conn, error)
}

I created a script that logs Dial calls when using the pure Go resolver: https://go.dev/play/p/0O_ARZyK2eG

If I run this script locally, I see something like this:

$ ./resolve
Dial(udp, 127.0.0.53:53)
Dial(udp, 127.0.0.53:53)
{172.217.24.46 }
{2404:6800:4006:804::200e }

However, if I run the script with strace, I see that Go is making additional connections some other way:

$ strace ./resolve 2>&1 | grep '^connect'
connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(9), sin_addr=inet_addr("172.217.24.46")}, 16) = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(9), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2404:6800:4006:804::200e", &sin6_addr), sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable)

There's is one hardcoded call to net.DialUDP here which appears to be the source of the additional connections.

What did you expect to see?

I expect to see the Dial function used for all connections made by the pure Go resolver.

What did you see instead?

I see that the Dial function is only used in some cases.

Additional context

CL 500576 fixes the issue by using net.Resolver.Dial in all cases.

For context, this change is important for targets with limited networking capabilities (e.g. GOOS=wasip1). It means that users can provide their own Dial function to make use of the pure Go resolver. At the moment the hardcoded net.DialUDP call makes the pure Go resolver off limits for these targets.

There was some concern in the CL about whether making this change for all targets would break code in the wild. I'm submitting it as a bug report so we can discuss here instead.

cc GOOS=wasip1 maintainers: @achille-roussel @johanbrandhorst @Pryz

cc those that commented on CL 500576: @mateusz834 @ianlancetaylor

chriso avatar Jun 09 '23 22:06 chriso

If I replace the hardcoded DialUDP call with r.dial("udp") then the provided Dial function is used in all cases.

-c, err = DialUDP("udp", nil, &dst)
+c, err = r.dial(ctx, "udp", dst.IP.String())

This has the additional benefit of threading the lookup context through to the underlying dialer.

If we're concerned about breaking code in the wild, we could instead opt-in by target, and take this path for GOOS=wasip1 only for now (since it has limited networking capabilities, and DialUDP always fails).

This approach was suggested by @mateusz834:

if runtime.GOOS == "wasip1" {
    c, err = r.dial(ctx, "udp", dst.IP.String())
} else {
    c, err = DialUDP("udp", nil, &dst)
}

@ianlancetaylor suggested that we might instead require an additional hook:

type Resolver struct {
    Dial func(ctx context.Context, network, address string) (Conn, error)
    
    // Extra hook:
    DialUDP func(ctx context.Context, network, address string) (Conn, error)
}

or something like this:

type Resolver struct {
    Dial func(ctx context.Context, network, address string) (Conn, error)
    
    // Extra hook:
    UDPConnect func(ctx context.Context, *UDPAddr) (*UDPAddr, bool)
}

chriso avatar Jun 09 '23 22:06 chriso

Change https://go.dev/cl/500576 mentions this issue: net: prefer Resolver.Dial over DialUDP on wasip1

gopherbot avatar Jun 09 '23 23:06 gopherbot

The runtime.GOOS == "wasip1" guard was just a simple fix idea, but I agree with @ianlancetaylor that having a per platform behaviour in this case is not ideal.

I think that this hook should be named something like IsAddrReachable, so that the intention is clear. And probably it should use the netip.Addr at this point.

type Resolver struct {
    // IsAddrReachable is used for address sorting by the go resolver.
    // When this field is equal to nil, the default dialer is being used. addr is considered reachable,
    // when the default dialer sucesfully establishes a UDP connection to addr.
    IsAddrReachable func(ctx context.Context, addr netip.Addr) (local netip.Addr, reachable bool)
}

mateusz834 avatar Jun 10 '23 06:06 mateusz834

CL 502315 improved the situation for wasip1 by addressing the panic in net.DialUDP. Since it no longer panics, an error from the hardcoded call only affects the sort order.

chriso avatar Jun 14 '23 02:06 chriso

What would be an option to have dns resolution in chrome with wasip1?

remyleone avatar Aug 22 '25 16:08 remyleone

Hey folks, I got some time and decided to create a repro that can help with fixing this issue on other OSes like MacOS aka Darwin using ktrace since strace is only for Linux, and for example

//go:build darwin
// +build darwin

package main

import (
	"bytes"
	"context"
	"fmt"
	"os"
	"os/exec"
	"path/filepath"
	"regexp"
	"strings"
	"syscall"
	"time"
)

const progGo = `
package main

import (
	"context"
	"fmt"
	"net"
	"os/signal"
	"syscall"
	"time"
)

func main() {
	var d net.Dialer

	r := net.Resolver{
		PreferGo: true,
		Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
			fmt.Printf("Dial(%s, %s)\n", network, address)
			return d.DialContext(ctx, network, address)
		},
	}

	ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGIO, syscall.SIGTERM)
	defer cancel()

	for {
		select {
		case <-ctx.Done():
			return

		case <-time.After(1 * time.Second):
			ips, err := r.LookupIPAddr(ctx, "google.com")
			if err != nil {
				panic(err)
			}
			for _, ip := range ips {
				fmt.Println(ip)
			}
		}
	}
}
`

func main() {
	tmpDir, err := os.MkdirTemp("", "60712")
	if err != nil {
		panic(err)
	}
	defer os.RemoveAll(tmpDir)

	path := filepath.Join(tmpDir, "outf.go")
	if err := os.WriteFile(path, []byte(progGo), 0755); err != nil {
		panic(err)
	}
	binaryPath := filepath.Join(tmpDir, "ourbin")

	ctx := context.Background()
	if err := exec.CommandContext(ctx, "go", "build", "-o", binaryPath, path).Run(); err != nil {
		panic(err)
	}
	cmd := exec.CommandContext(ctx, binaryPath)
	if err := cmd.Start(); err != nil {
		panic(err)
	}

	ktraceCmd := exec.CommandContext(ctx, "sudo", "ktrace", "trace", "-p", fmt.Sprintf("%d", cmd.Process.Pid))
	stdout := new(bytes.Buffer)
	ktraceCmd.Stdout = stdout
	if err := ktraceCmd.Start(); err != nil {
		println(stdout.String())
		panic(err)
	}

	<-time.After(5 * time.Second)
	if err := cmd.Process.Signal(syscall.SIGTERM); err != nil {
		panic(err)
	}
	if err := ktraceCmd.Process.Signal(syscall.SIGTERM); err != nil {
		panic(err)
	}

	regConnect := regexp.MustCompile(".*connect.*")
	if matches := regConnect.FindAllString(stdout.String(), -1); len(matches) != 0 {
		println("Found connect like syscall invocations")
		// The header is at the first line.
		if i := strings.Index(stdout.String(), "\n"); i >= 0 {
			println(stdout.String()[:i])
		}
		for _, match := range matches {
			println(match)
		}
		if err := os.WriteFile("ktrace.log", stdout.Bytes(), 0755); err != nil {
			panic(err)
		}
		panic("found credible match, entire written to: ktrace.log")
	}
	println("no match")
}

which requires to be run as Super User and then prints out

$ sudo go run main.go 
Found connect like syscall invocations
walltime                          delta(us)(duration)    debug-id                             arg1             arg2             arg3             arg4             thread-id        cpu  process-name(pid)                             
2025-12-09 23:07:53.705276 EST          1.8              BSC_connect                          6                2e7c5f8b612c     10               2e7c5f8c2be8     103530d            4(AP) ourbin(76027)  
2025-12-09 23:07:53.705290 EST          1.7              BSC_connect                          7                2e7c5f71c04c     10               2e7c5f8bcbe8     103530c            6(AP) ourbin(76027)  
2025-12-09 23:07:53.705304 EST          7.5(27.8)        BSC_connect                          0                0                0                128fb            103530d            4(AP) ourbin(76027)  
2025-12-09 23:07:53.705309 EST          0.6(19.5)        BSC_connect                          0                0                0                128fb            103530c            6(AP) ourbin(76027)  
2025-12-09 23:07:54.002628 EST          1.7              BSC_connect                          6                2e7c5f8b419c     1c               2e7c5f8c1148     103530c            0(AP) ourbin(76027)  
2025-12-09 23:07:54.002641 EST          5.3(12.4)        BSC_connect                          41               0                0                128fb            103530c            0(AP) ourbin(76027)  
2025-12-09 23:07:54.002659 EST          1.2              BSC_connect                          6                2e7c5f8b61ac     10               2e7c5f8c1148     103530c            0(AP) ourbin(76027)  
2025-12-09 23:07:54.002674 EST          8.3(14.7)        BSC_connect                          0                0                0                128fb            103530c            0(AP) ourbin(76027)  
2025-12-09 23:07:55.004255 EST          1.9              BSC_connect                          7                2e7c5f71c0ac     10               2e7c5f8c0be8     1035321            2(AP) ourbin(76027)  
2025-12-09 23:07:55.004255 EST          0.3              BSC_connect                          6                2e7c5f92204c     10               2e7c5f920be8     103530c            0(AP) ourbin(76027)  
2025-12-09 23:07:55.004287 EST          8.1(32.4)        BSC_connect                          0                0                0                128fb            1035321            2(AP) ourbin(76027)  
2025-12-09 23:07:55.004297 EST          0.2(41.3)        BSC_connect                          0                0                0                128fb            103530c            0(AP) ourbin(76027)  
2025-12-09 23:07:55.010853 EST          0.3              BSC_connect                          6                2e7c5f80e81c     1c               2e7c5f91f148     103530c            6(AP) ourbin(76027)  
2025-12-09 23:07:55.010886 EST         20.3(33.2)        BSC_connect                          41               0                0                128fb            103530c            6(AP) ourbin(76027)  
2025-12-09 23:07:55.010947 EST          3.4              BSC_connect                          6                2e7c5f71c12c     10               2e7c5f91f148     103530c            6(AP) ourbin(76027)  
2025-12-09 23:07:55.011003 EST         21.4(56.0)        BSC_connect                          0                0                0                128fb            103530c            6(AP) ourbin(76027)  
2025-12-09 23:07:56.011587 EST          0.2              BSC_connect                          7                2e7c5f71c18c     10               2e7c5f91ebe8     1035321            2(AP) ourbin(76027)  
2025-12-09 23:07:56.011600 EST          0.3              BSC_connect                          6                2e7c5f8b622c     10               2e7c5f91abe8     103530c            6(AP) ourbin(76027)  
2025-12-09 23:07:56.011614 EST         12.9(27.1)        BSC_connect                          0                0                0                128fb            1035321            2(AP) ourbin(76027)  
2025-12-09 23:07:56.011622 EST          0.3(21.8)        BSC_connect                          0                0                0                128fb            103530c            6(AP) ourbin(76027)  
2025-12-09 23:07:56.016401 EST          2.4              BSC_connect                          6                2e7c5f80e85c     1c               2e7c5f8c3148     103530f            6(AP) ourbin(76027)  
2025-12-09 23:07:56.016415 EST          0.2(14.2)        BSC_connect                          41               0                0                128fb            103530f            6(AP) ourbin(76027)  
2025-12-09 23:07:56.016440 EST          1.6              BSC_connect                          6                2e7c5f71c20c     10               2e7c5f8c3148     103530f            6(AP) ourbin(76027)  
2025-12-09 23:07:56.016466 EST          8.1(25.8)        BSC_connect                          0                0                0                128fb            103530f            6(AP) ourbin(76027)  
panic: found credible match, entire written to: ktrace.log

goroutine 1 [running]:
main.main()
	/Users/emmanuelodeke/Desktop/openSrc/bugs/golang/60712/main.go:114 +0x79b
exit status 2

of which BSC_connect is the BSD System Call. When fixed I believe the program which can also be tailored for the program

odeke-em avatar Dec 10 '25 04:12 odeke-em