dtls icon indicating copy to clipboard operation
dtls copied to clipboard

Client never recovers/disconnects lost connection from server restart

Open hjames9 opened this issue 4 years ago • 10 comments

If a client is continuously sending messages on a properly established DTLS connection, if the server dies and restarts, the client never detects the server is down or attempt to reestablish the connection.

Here's an example of a client:

package main

import (
    "log"
    "net"
    "time"

    "github.com/pion/dtls"
)

func main() {
    addr := &net.UDPAddr{IP: net.ParseIP("127.0.0.1"), Port: 6553}
    certificate, privateKey, err := dtls.GenerateSelfSigned()
    if err != nil {
        log.Fatal(err)
        return
    }   

    config := &dtls.Config{
        Certificate:          certificate,
        PrivateKey:           privateKey,
        ExtendedMasterSecret: dtls.RequireExtendedMasterSecret,
        ConnectTimeout:       dtls.ConnectTimeoutOption(30 * time.Second),
        InsecureSkipVerify:   true,
    }   

    for {
        conn, err := dtls.Dial("udp", addr, config)
        if err != nil {
            log.Fatal(err)
            return
        }
        defer conn.Close()
        log.Println("Established connection...")

        for {
            message := []byte("Sup youngin...")
            amt, err := conn.Write(message)
            if err != nil {
                log.Println(err)
                break
            }
            log.Printf("Successfully wrote %d bytes", amt)

            time.Sleep(3 * time.Second)
        }
    }   
}

Here's an example of a server receiving the message:

package main

import (
    "log"
    "net"
    "time"

    "github.com/pion/dtls"
)

func main() {
    addr := &net.UDPAddr{IP: net.ParseIP("127.0.0.1"), Port: 6553}
    certificate, privateKey, err := dtls.GenerateSelfSigned()
    if err != nil {
        log.Fatal(err)
        return
    }   

    config := &dtls.Config{
        Certificate:          certificate,
        PrivateKey:           privateKey,
        ExtendedMasterSecret: dtls.RequireExtendedMasterSecret,
        ConnectTimeout:       dtls.ConnectTimeoutOption(30 * time.Second),
        InsecureSkipVerify:   true,
        ClientAuth:           dtls.RequireAnyClientCert,
    }   

    listener, err := dtls.Listen("udp", addr, config)
    if err != nil {
        log.Fatal(err)
        return
    }   
    defer func() {
        listener.Close(5 * time.Second)
    }() 

    for {
        conn, err := listener.Accept()
        if err != nil {
            log.Fatal(err)
            continue
        }

        go communicateData(conn)
    }   
}

func communicateData(conn net.Conn) {
    buffer := make([]byte, 1024)
    for {
        defer conn.Close()
        amt, err := conn.Read(buffer)
        if err != nil {
            break
        }
        log.Printf("%s\n", string(buffer[:amt]))
    }
}

Would expect some way for the client to realize that it has to resend a message because the previous DTLS session is broken, but currently reading and writing on the session is still successful.

Your environment.

  • Version: Latest GIT tag

hjames9 avatar Oct 27 '19 22:10 hjames9

UDP doesn't really have connections, so we'd basically have to have a mechanism client side that's configured with a value for a timeout, after which we consider the server to be gone and attempt to re-establish a DTLS session. I'm not sure this needs to be part of the DTLS library, this is a reality of using UDP that you need to handle yourself, as what we consider to be a broken "connection" in UDP is highly application specific.

You need to do something like this in your client:

deadline := time.Now().Add(*timeout)
conn.SetReadDeadline(deadline)

nRead, addr, err := conn.ReadFrom(buffer)
if err != nil {
    ...
}

This only works if your client expects to always get a reply though, which is not necessarily the case with UDP.

daenney avatar Oct 28 '19 09:10 daenney

I would imagine that if the server restarts you'd be getting errors though, as the TLS session is no longer valid. Though I suppose you'll only ever see that if you call Read() on the conn b/c that's the point where we try to decrypt. If the server just dies, as in goes away but never comes back, you'd have to use to a timeout on the conn or have some application level specific ping-pong mechanism that lets your client detect the server is gone.

daenney avatar Oct 28 '19 09:10 daenney

Turns out I'm wrong on being able to detect a restart. DTLS implementations silently ignore invalid records, so there's no way to detect that.

Unlike TLS, DTLS is resilient in the face of invalid records (e.g., invalid formatting, length, MAC, etc.). In general, invalid records SHOULD be silently discarded, thus preserving the association; however, an error MAY be logged for diagnostic purposes. Implementations which choose to generate an alert instead, MUST generate fatal level alerts to avoid attacks where the attacker repeatedly probes the implementation to see how it responds to various types of error. Note that if DTLS is run over UDP, then any implementation which does this will be extremely susceptible to denial-of-service (DoS) attacks because UDP forgery is so easy. Thus, this practice is NOT RECOMMENDED for such transports.

Basically, DTLS only introduced reliability for the purpose of being able to complete a (modified) TLS handshake over a datagram transport. After that, it's back to being as reliable as the transport itself, so in the case of UDP not at all.

You'll need an application level mechanism, i.e a request-response cycle that you expect to complete within a certain window of time, to detect an issue like a server having gone away or having restarted and lost the session.

daenney avatar Oct 28 '19 11:10 daenney

Got it, makes sense. Thanks for analysis. Agreed that it seems that this should be more handled by the application side as per the nature of DTLS/UDP. However, do you think the library should expose more protocol details to aid in that? For instance should the sequence numbers on the DTLS packets be exposed in order for the application to detect gaps or maybe provide callbacks for a library gap detection? It seems like the sequence number was designed to aid in these scenarios.

hjames9 avatar Oct 28 '19 11:10 hjames9

@daenney Also, do you think alerting (generating and receiving), should be exposed at all?

hjames9 avatar Oct 28 '19 12:10 hjames9

Hey @hjames9

We can expose statistics if that is helpful! I think that would be a great contribution, I am not sure what that would look exactly like yet though. If we can copy things that crypto/tls or OpenSSL do would be a great start.

If not we can start with basics like bytes transmitted/received.

One thing you could do right now is set a custom logger and look at all the log messages we emit. You can pass in the constructor your own Logger instance and then process every message we generate (and react to them)

Sean-Der avatar Nov 10 '19 09:11 Sean-Der

@daenney Understood your explanation on alerting i.e. should only be diagnostic and should be fatal for the session if used, but in general is not a best practice.

@Sean-Der Keeping live statistics might be helpful in this situation. I'll educate myself on how crypto/tls handles it and see if anything useful can be replicated.

Thanks!

hjames9 avatar Nov 10 '19 16:11 hjames9

@Sean-Der Can you point me to where an API like this exists in crypto/tls? I can try and replicate something similar here.

hjames9 avatar May 11 '20 04:05 hjames9

ref: https://www.cs.ru.nl/bachelors-theses/2018/Niels_Drueten___4496604___Security_analysis_of_DTLS_1_2_implementations.pdf page 14: "Closing a DTLS connection the implementation should send a close notify alert. This way, the peer knows the connection is ending"

tigersean avatar Mar 04 '21 01:03 tigersean

Hey @tigersean

We close when we get a Close Alert

We also send a Close Alert when the user closes the connection.

Sean-Der avatar Mar 05 '21 03:03 Sean-Der