go-coap icon indicating copy to clipboard operation
go-coap copied to clipboard

Connection from client not reestablished when server restarts using DTLS

Open hjames9 opened this issue 6 years ago • 7 comments

Using server from this example: https://github.com/go-ocf/go-coap/blob/master/examples/dtls/server/main.go i.e. "go run main.go"

And slightly modified client to continuously call coap.Get() every 3 seconds to send/receive data https://github.com/go-ocf/go-coap/blob/master/examples/dtls/client/main.go

    co, err := coap.DialDTLS("udp", "localhost:5688", &dtls.Config{
        PSK: func(hint []byte) ([]byte, error) {
            fmt.Printf("Server's hint: %s \n", hint)
            return []byte{0xAB, 0xC1, 0x23}, nil 
        },
        PSKIdentityHint: []byte("Pion DTLS Server"),
        CipherSuites:    []dtls.CipherSuiteID{dtls.TLS_PSK_WITH_AES_128_CCM_8},
    })  
    if err != nil {
        log.Fatalf("Error dialing: %v", err)
    }   
    for {
        path := "/a"
        if len(os.Args) > 1 { 
            path = os.Args[1]
        }
        resp, err := co.Get(path)
        if err != nil {
            log.Fatalf("Error sending request: %v", err)
        }

        log.Printf("Response payload: %v", resp.Payload())
        time.Sleep(3 * time.Second)
    }

If I restart the server process, the client never recovers and detects that the Get() fails. Get() blocks indefinitely and you're never able to determine that the connection fails and attempt to reestablish it via Dial(). I'm unsure if I'm missing something, but what's the best practice to follow in this scenario?

hjames9 avatar Oct 27 '19 21:10 hjames9

Another issue (related, but problem with the server instead of the client). When a server goes down and comes back, with a client being unaware and continues to send messages, the coap.ListenAndServeDTLS() on the server returns with failure "cannot serve tcp: cannot accept connections: dtls: The connection timed out during the handshake"

If you attempt to call coap.ListenAndServeDTLS() on the same port, after it returns, it fails binding the port: "cannot listen and serve: cannot create new dtls listener: listen udp :5688: bind: address already in use"

Looks like two issues.

  1. Doesn't seem like ListenAndServerDTLS() should return failure if one client happens to be in a different state and is sending messages (perhaps the handshaking process shouldn't be triggered at all here?)
  2. If/When ListenAndServerDTLS() does return, it doesn't seem as if it's properly cleaning up the port it's listening on.

hjames9 avatar Oct 29 '19 02:10 hjames9

Hi

Instead call resp, err := co.Get(path) you can use resp, err := co.GetWithContext(ctx, path) and in context you can define timeout.

Doesn't seem like ListenAndServerDTLS() should return failure if one client happens to be in a different state and is sending messages (perhaps the handshaking process shouldn't be triggered at all If/When ListenAndServerDTLS() does return, it doesn't seem as if it's properly cleaning up the port it's listening on.

I will look at it on next week.

jkralik avatar Nov 01 '19 17:11 jkralik

@jkralik I looked more into this and it appears that the DTLS code doesn't read any of the packet headers before trying to establish a new connection/session when it receives a message it doesn't know anything about. So it starts doing the DTLS handshake logic even though it's receiving application data packets.

Listener starting readLoop on new session: https://github.com/pion/dtls/blob/0649d6e970ceb0913223f5696be74474788d0a9b/internal/udp/conn.go#L100

Listener fetching conn/session object: https://github.com/pion/dtls/blob/0649d6e970ceb0913223f5696be74474788d0a9b/internal/udp/conn.go#L118

Listener creating new conn/session without verifying incoming packet is in the correct state: https://github.com/pion/dtls/blob/0649d6e970ceb0913223f5696be74474788d0a9b/internal/udp/conn.go#L137

hjames9 avatar Nov 12 '19 23:11 hjames9

@jkralik So that explains the handshake timeout. I think the DTLS implementers would argue that this behavior is fine as UDP forces the application to deal with these scenarios as it's connectionless. But the ListenAndServerDTLS function should still not return based on one failed/faulty session. Still trying to track that down.

hjames9 avatar Nov 13 '19 00:11 hjames9

@jkralik This looks like the incorrect behavior in go-coap. https://github.com/go-ocf/go-coap/blob/fd84fc2e6f05d98a193b1fb598523c7521b779ce/server.go#L536

I don't think we would want to exit this loop on one session failing. I've also noticed something similiar with the TCP listener.

https://github.com/go-ocf/go-coap/blob/fd84fc2e6f05d98a193b1fb598523c7521b779ce/server.go#L605

I think the more correct behavior would to either log a message and continue the loop or change the API to provide an error callback when these situations happen? What do you think? I can provide a PR with any of these types of changes.

Thanks!

hjames9 avatar Nov 13 '19 00:11 hjames9

@hjames9 It's expected that this error occurs only when listen socket was closed. When you shutdown server you get error - this I want to to change too -> it must returns without error message.

Maybe for some errors we can just log them via callback function.

Pls create PR and I will review it. I'm so glad that you are contributing. Thx!

jkralik avatar Nov 13 '19 09:11 jkralik

@jkralik Most of the issues I had were resolved with these last 2 PRs. The issue of the listening port not being closed once the ListenAndServe function returns still remains, but I believe that's how it works similarly to how net/http, so it might be ok. What do you think?

hjames9 avatar May 11 '20 16:05 hjames9