opamp-go icon indicating copy to clipboard operation
opamp-go copied to clipboard

opamp.Connection needs a Mutex to avoid concurrent write panic

Open andykellr opened this issue 2 years ago • 2 comments

In a busy server, we encountered a concurrent write panic. I added protection in the server, but ideally the library would add a Mutex to prevent this from happening.

Here is the relevant stack fragment:

panic: concurrent write to websocket connection
goroutine 115 [running]:
github.com/gorilla/websocket.(*messageWriter).flushFrame(0xc001dc95a8, 0x1, {0xc001d88160?, 0x2?, 0xc001dc95d8?})
        .../go/pkg/mod/github.com/gorilla/[email protected]/conn.go:617 +0x52b
github.com/gorilla/websocket.(*Conn).WriteMessage(0xc001da8f20, 0xc0003d2930?, {0xc001d88160, 0x142, 0x142})
        .../go/pkg/mod/github.com/gorilla/[email protected]/conn.go:770 +0x139
github.com/open-telemetry/opamp-go/server.wsConnection.Send({0xc001dc9750?}, {0xc001d950e0?, 0xc001dba050?}, 0x270c?)
        .../go/pkg/mod/github.com/open-telemetry/[email protected]/server/wsconnection.go:30 +0x55
...

andykellr avatar May 27 '22 15:05 andykellr

This is working as intended per the Connection interface:

https://github.com/open-telemetry/opamp-go/blob/3f2eab449870fb6ed667fc55a205a68bf97a0935/server/types/connection.go#L16-L21

I will do some benchmarking with a large number of connections to determine the overhead of adding a mutex per connection.

andykellr avatar May 31 '22 18:05 andykellr

FYI, we protect the connection with a mutex in our example. So it seems reasonable to just move this mutex inside the connection struct and have proper encapsulation.

One argument against having the mutex in the connection is that in the server implementation one may have a mutex per agent anyway (e.g. to protect agent's data fields) and that same mutex can be used to protect the connection. However, we don't do it in our example (we have a separate mutex for data fields) so perhaps the combined mutex is not such a good option (otherwise why don't we use it?).

tigrannajaryan avatar May 31 '22 19:05 tigrannajaryan

I missed this issue while working on #200. In my opinion having separate mutexes for the WS connection and the agent data fields makes sense because the cost of the mutexes is low and it would cause contention between updating the agent and sending it messages if we had a shared mutex.

@andykellr @tigrannajaryan Are we satisfied that this is resolved?

evan-bradley avatar Sep 21 '23 15:09 evan-bradley

Yes, I think we can close this.

tigrannajaryan avatar Sep 21 '23 16:09 tigrannajaryan