websocket icon indicating copy to clipboard operation
websocket copied to clipboard

sporadic runtime error: send on closed channel

Open gordonklaus opened this issue 10 years ago • 7 comments
trafficstars

I am sporadically seeing this error at: https://github.com/gopherjs/websocket/blob/master/conn.go#L130 Presumably it could also happen at: https://github.com/gopherjs/websocket/blob/master/conn.go#L122

Because the sends are done in separate goroutines, they aren't guaranteed to execute anytime soon – in particular not before c.ch is eventually closed.

Might I recommend, as JS runs single-threaded, simply to queue messages in a slice rather than a chan? No synchronization necessary, no need to spin up goroutines -> consistent results! Hopefully ;-)

gordonklaus avatar May 09 '15 07:05 gordonklaus

Ah, silly me. Of course you need synchronization in order to do blocking Read.

gordonklaus avatar May 09 '15 08:05 gordonklaus

The problem seems to go away if I increase the buffer size of Conn.ch. Although I suppose it would crop up again with a slow Reader under heavy messaging load.

I should correct my initial reasoning about this bug: I only expect the error to occur at: https://github.com/gopherjs/websocket/blob/master/conn.go#L130 and never at: https://github.com/gopherjs/websocket/blob/master/conn.go#L122 because c.ch <- nil will never come after the channel is closed. But the order of these two sends is indeterminate, and thus the non-nil send may execute long after the nil send, and even after the channel is closed.

gordonklaus avatar May 09 '15 08:05 gordonklaus

Despite my ironclad reasoning above, I am observing "send on closed channel" in onClose. I have verified that onClose is the only instance sending nil, and that it only tries to do it once, but that it somehow tries to do so after the channel has already been closed. Baffled.

gordonklaus avatar May 09 '15 11:05 gordonklaus

I am seeing this too, in (as far as I can tell) up to date versions of gopherjs and this websocket package.

My server sends two messages and closes the websocket. The websocket package receives the first and puts it on the conn channel, where another waiting goroutine immediately reads it. The websocket receives the second message and queues it (filling the channel's one-item capacity). The onClose() then runs, which tries to send a nil, which blocks, except that it seems to queue the nil and then block, don't ask me how.

At that point, the goroutine that's reading the channel services the second message, and then gets the nil that the onClose sent, and calls handleFrame() in conn.go, which closes the channel. Then the onClose() goroutine is finally scheduled again, just after the Javascript $send() call, but before the check to see if the channel is closed. Which it now is closed, so it throws a "send on closed channel" error.

In the Chrome console, if I click on the $b function in the backtrace from the error, I land in the compiled JS of conn.go's onClose() function, with the cursor in the indicated spot (linebreaks added):

$r = $send(c.ch, ptrType$3.nil);
$s = 1;
case 1: if($c) { $c = false; $r = $r.$blk(); }
                                  // ^cursor here, right before $blk()

$r.$blk() is what actually checks for a closed channel and throws the error.

I too made the error go away by increasing the channel capacity from 1 to 2, but agree that that seems like hiding the problem, not fixing it.

I also got the error to go away, even with a capacity of zero on that channel, by calling close(c.ch) in handleFrame() in its own goroutine.

Before (broken):

} else if message == nil {
    // See onClose for the explanation about sending a nil item.
    close(c.ch)
    return nil, io.EOF
}

After (seems to work in light testing):

} else if message == nil {
    // See onClose for the explanation about sending a nil item.
    go func() {
        close(c.ch)
    }()
    return nil, io.EOF
}

This probably isn't the best solution, but it may point the way to the problem or a better solution. Putting the close in a goroutine of its own seems to let the check-for-close after the nil is sent complete successfully.

This looks like it might possibly be a gopherjs error, so I'm going to tag @shurcooL. I don't have any more time tonight or I'd try my hand at creating a minimal test case in the playground.

theclapp avatar Jun 16 '16 02:06 theclapp

Hey, thanks for the detailed analysis and the mention @theclapp. I am absolutely interested in looking at this issue and finding an optimal solution. But realistically, I won't be able to get to it for some time (maybe this weekend, but more likely a few weekends away). I'll add it to my backlog of tasks to get to so I don't forget. Thanks! :)

dmitshur avatar Jun 16 '16 08:06 dmitshur

Minimal (or at least, really small) test case in the playground, http://www.gopherjs.org/playground/#/RyMy65T8Kr

package main

func main() {
    ch := make(chan bool)
    go func() {
        ch <- false
    }()
    go func() {
        select {
        case <-ch:
            close(ch)
        }
    }()
}

Result: JS Console: Uncaught Error: runtime error: send on closed channel

Actually this kind of clinches that this isn't a websocket bug. Will file a bug on gopherjs.

theclapp avatar Jun 17 '16 20:06 theclapp

Awesome find @theclapp! We'll definitely need to resolve this bug in GopherJS first, then it's likely this WebSocket issue will either go away, or be easier to fix.

dmitshur avatar Jun 19 '16 22:06 dmitshur