go icon indicating copy to clipboard operation
go copied to clipboard

proposal: net: Add control callback before connection creation to Dialer and ListenConfig

Open leonshaw opened this issue 2 years ago • 4 comments

Background

Sometimes we need to dial inside another net namespace and switch back. On linux, namespace is thread-level, so a thread lock is needed.

With "github.com/vishvananda/netns" for example:

func connect(target string, network, addr string) net.Conn {
	dialer := &net.Dialer{}
	runtime.LockOSThread()
	defer runtime.UnlockOSThread()
	origNS, _ := netns.Get()
	targetNS, _ := netns.GetFromName(target)
	netns.Set(targetNS)
	defer netns.Set(origNS)
	conn, _ := dialer.Dial(network, addr)
	return conn
}

However, this blocks the current thread until Dial() returns. A workaround may be moving the unlocking stuff into dialer.Control:

	dialer.Control = func(network, address string, c syscall.RawConn) error {
		netns.Set(origNS)
		runtime.UnlockOSThread()
		return nil
	}

This doesn't work well because dialParallel() dials in new goroutines which are out of control.

Proposal

Add callback before and after the connection (socket) is created in net.Dialer:

type Dialer struct {
	PreControl func(network, address string) error
	PostControl func(network, address string) error
}

The call of PreControl - PostControl pair should be in the same goroutine and surround the socket syscall. PostControl should be called regardless of whether the conn is successfully created.

Similar thing can be done with net.ListenConfig.

leonshaw avatar Aug 06 '22 08:08 leonshaw

Can you explain exactly which system calls must be made with the new namespace, and which can be made with the original namespace? For example, are you saying that socket must be called with the new namespace, but that it's OK to call connect with the original namespace?

If that is the distinction, can't we use the existing Control field? So perhaps that is not the distinction, but then what is?

Thanks.

ianlancetaylor avatar Aug 06 '22 14:08 ianlancetaylor

Can you explain exactly which system calls must be made with the new namespace, and which can be made with the original namespace? For example, are you saying that socket must be called with the new namespace, but that it's OK to call connect with the original namespace?

Yes. AFAIK of linux, a socket is bound to the thread's namespace at creation time. Once created, it's OK for connect or other syscalls to be called in any namespace.

If that is the distinction, can't we use the existing Control field? So perhaps that is not the distinction, but then what is?

Control could work if there's something like SO_NETNS in setsockopt, but there isn't, and we have to set namespace of thread. Then we have to make sure that both setns syscalls happen in the same thread, otherwise the status is leaked. It should also be guaranteed that no new goroutine is spawned between the two setnss, otherwize the (potentially) new thread is leaked to the new namespace. See the background, in current implementation, dialParallel() callsControl in new goroutines, and the dialer also spawns new goroutine if Dialer.Cacel is not nil.

leonshaw avatar Aug 06 '22 15:08 leonshaw

Just to be sure I'm clear, it sounds like the problem is that you want to set the namespace before calling socket. Setting the namespace requires locking the thread, and you want to unlock the thread before calling connect. Using the Dial.Control field doesn't work because at that point socket has already been called, and it may have been called by a different goroutine running on a different thread.

If I understand correctly, then the code you show above won't work, because dialer.Dial might call socket on a different goroutine which will not be running on the locked thread that called netns.Set. Is that correct?

ianlancetaylor avatar Aug 06 '22 19:08 ianlancetaylor

Just to be sure I'm clear, it sounds like the problem is that you want to set the namespace before calling socket. Setting the namespace requires locking the thread, and you want to unlock the thread before calling connect. Using the Dial.Control field doesn't work because at that point socket has already been called, and it may have been called by a different goroutine running on a different thread.

Yes, exactly.

If I understand correctly, then the code you show above won't work, because dialer.Dial might call socket on a different goroutine which will not be running on the locked thread that called netns.Set. Is that correct?

Yes. The lock-setns-socket-setns-unlock sequence must happen in the same goroutine.

leonshaw avatar Aug 06 '22 19:08 leonshaw