dht icon indicating copy to clipboard operation
dht copied to clipboard

Panic on WSA transient errors

Open elgatito opened this issue 7 years ago • 9 comments
trafficstars

Panic caught by user on windows.

http://paste.ubuntu.com/26059096/

There is a Client config on line 137.

elgatito avatar Nov 27 '17 17:11 elgatito

this is due to how the server runs the accept listener in a routine and panics on basically any error =(

the serve method should be exposed and allow the caller to handle any errors.

james-lawrence avatar May 14 '18 01:05 james-lawrence

I think that is the best approach. Is anyone still seeing this panic occur?

anacrolix avatar May 15 '18 04:05 anacrolix

Let's reopen if someone has the problem again.

anacrolix avatar May 25 '18 07:05 anacrolix

@anacrolix I wanted to run the dht-server and encountered this problem. I think I have a solution to this but it might involve a patch to Go and I need to do homework on which WSA errors are and are not to be considered temporary. I'm going to investigate this after I get some sleep but wanted to make this post so I don't forget to follow up with this. Screencast related: https://www.youtube.com/watch?v=GAkn9KI4ceY Will submit a PR later.

djdv avatar Sep 20 '20 14:09 djdv

@djdv Really cool to watch your screencast. Your process and conclusions are spot on regarding WSA errors, I think this has been reported, and commented on in various places in my own codebases, and in golang/go.

anacrolix avatar Sep 22 '20 03:09 anacrolix

Nice. I'm going to wait to see what some of the Go team says about which WSA errors do and don't count as temporary before submitting anything. The manual hacks might not be necessary if they just get rolled into go. If not, I'll have to break the check out into a build constrained file. (Sorry for the delay, I took a small injury (;´∀`)

For now, if anyone really needs it, here's the hacky patch.

diff --git a/server.go b/server.go
index eedfa81..efb5b6e 100644
--- a/server.go
+++ b/server.go
@@ -7,6 +7,8 @@ import (
 	"fmt"
 	"io"
 	"net"
+	"os"
+	"runtime"
 	"runtime/pprof"
 	"text/tabwriter"
 	"time"
@@ -24,6 +26,8 @@ import (
 	"github.com/anacrolix/stm"
 
 	"github.com/anacrolix/dht/v2/krpc"
+
+	"golang.org/x/sys/windows"
 )
 
 // A Server defines parameters for a DHT node server that is able to send
@@ -291,11 +295,37 @@ func (s *Server) processPacket(b []byte, addr Addr) {
 	s.deleteTransaction(tk)
 }
 
+// TODO: [anyone] remove this when Go itself considers these errors to be temporary errors
+func actuallyTemporary(nErr *net.OpError) bool {
+	var sErr *os.SyscallError
+	if errors.As(nErr.Err, &sErr) {
+		switch sErr.Err {
+		case windows.WSAENETRESET,
+			windows.WSAECONNRESET,
+			windows.WSAECONNABORTED,
+			windows.WSAECONNREFUSED,
+			windows.WSAENETUNREACH,
+			windows.WSAETIMEDOUT:
+			return true
+		}
+	}
+	return false
+}
+
 func (s *Server) serve() error {
 	var b [0x10000]byte
 	for {
 		n, addr, err := s.socket.ReadFrom(b[:])
 		if err != nil {
+			var nErr *net.OpError
+			if errors.As(err, &nErr) {
+				if nErr.Temporary() {
+					continue
+				}
+				if runtime.GOOS == "windows" && actuallyTemporary(nErr) {
+					continue
+				}
+			}
 			return err
 		}
 		expvars.Add("packets read", 1)

djdv avatar Sep 26 '20 13:09 djdv

Thanks @djdv . Could you link to the golang/go efforts to fix this?

anacrolix avatar Sep 28 '20 01:09 anacrolix

It looks like this is related to it https://github.com/golang/go/issues/35131 But the effort was abandoned. I'm going to look into it myself after I tackle something else first. Will post an update when things happen.

Likely just the temporary check will need to go in here and then newer version of Go will handle it. But if not, we'll need to break the other half out into a Windows constrained file.

djdv avatar Sep 29 '20 22:09 djdv

@djdv any update on this?

anacrolix avatar Oct 22 '21 10:10 anacrolix

I'm now seeing this in v2.17.0:

panic: read udp 0.0.0.0:54576: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.

goroutine 42 [running]:
github.com/anacrolix/dht/v2.(*Server).serveUntilClosed(0x150ce000)
	/go/pkg/mod/github.com/anacrolix/dht/[email protected]/server.go:256 +0xe3
created by github.com/anacrolix/dht/v2.NewServer
	/go/pkg/mod/github.com/anacrolix/dht/[email protected]/server.go:244 +0x60d

I'll make an up to date workaround in the short term.

anacrolix avatar Aug 11 '22 02:08 anacrolix

Same problem as https://github.com/anacrolix/torrent/issues/83.

anacrolix avatar Aug 12 '22 04:08 anacrolix

Fixed by https://github.com/anacrolix/dht/commit/f1f6c652cb216c69f54d0522dd002cbd41c54575.

anacrolix avatar Aug 22 '22 01:08 anacrolix

Is anyone still seeing this with v2.18.1?

anacrolix avatar Oct 18 '22 02:10 anacrolix

Sorry for not getting back to you on this. (I went mental IRL) Going to run an instance of the dht-server for a while and see if anything goes wrong. Promise not to take years to respond.

Also it's too late to mention it now, but it looks like Go ended up deprecating the whole Temporary method on net errors without any alternative (besides the original suggestion of manually comparing errors -- now via errors.Is -- to check against system specific errors, which aint great but is what it is)

// Deprecated: Temporary errors are not well-defined. // Most temporary errors are timeouts, and the few exceptions are surprising. // Do not use this method.

djdv avatar Oct 20 '22 21:10 djdv

No worries, thanks for contributing. I'm tempted to swap what I put in for the patch you provided above, especially if your testing shows what I have isn't sufficient.

anacrolix avatar Oct 20 '22 21:10 anacrolix

That might be the best for now. I'm no network expert, so I can't say for certain all of these errors can safely be ignored, but it's probably better to ignore them until someone says "hey that one shouldn't be ignored", rather than panicking. I say this because I got a panic almost immediately for WSAENETRESET (that's error 10052, not WSAECONNRESET/10054), which is conveniently not within the syscall package, but is in the windows pkg.

My log:

17:36:03 🏠 ⧽ bt-dht
2022-10-20T17:36:27-0400 NIL [main.loadTable:31]: loaded 0 nodes from table file
2022-10-20T17:36:27-0400 NIL [main.mainErr:72]: dht server on 192.168.1.40:40070, ID is 0f23c260e1c688e4d9b6a9e294d686b1fe36311a
panic: read udp 192.168.1.40:40070: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.

goroutine 6 [running]:
github.com/anacrolix/dht/v2.(*Server).serveUntilClosed(0xc000188000)
        C:/Users/Dominic Della Valle/Projects/Go/src/github.com/anacrolix/dht/server.go:256 +0xdc
created by github.com/anacrolix/dht/v2.NewServer
        C:/Users/Dominic Della Valle/Projects/Go/src/github.com/anacrolix/dht/server.go:244 +0x79c

djdv avatar Oct 20 '22 21:10 djdv

Err, just to be clear. The diff above should be changed to avoid calling Temporary, but that same list of error values should be checked instead of just the one being used right now.

Edit: keeping the build constraint like you have now is also better than doing it at runtime like in the quick hack I had.

djdv avatar Oct 20 '22 21:10 djdv

Thanks, I'll make that change soon.

anacrolix avatar Oct 21 '22 03:10 anacrolix

Update to v2.19.1.

anacrolix avatar Oct 21 '22 07:10 anacrolix

Ran an instance for a few hours. Seems to be working alright. Untitled

djdv avatar Oct 21 '22 11:10 djdv

@djdv thanks for the feedback and all the contribution you've made in this issue, it's really helped.

anacrolix avatar Oct 23 '22 06:10 anacrolix