Overlapping WebSocket messages close the socket
Describe the bug When client and EmbedIO heavily exchange messages on a WebSocket channel, the receiving and responding eventually overlap - especially with large messages. This results in the connection being closed, presumeably from the client (browser) side:
WebSocket connection to 'ws://localhost:8080/event' failed: Received start of new message but previous message is unfinished.
Have a look at this sample project simulating this behavior. It sends garbage text from and to the browser and waits randomly in the WebSocketModule to simulate actual work being done.
To Reproduce Steps to reproduce the behavior:
- Checkout the example project at https://github.com/bdurrer/embedio-websocket-example
- Start the server (your editor might have a different output path than mine, check that the html-root folder path is correct in the logged output
- Open http://localhost:8080/ in the browser
- Open the browser console
- Hit the spam button
- Wait until you get a red error shown in the console. It should take only seconds. (You can stop the spam by clicking the button again)
Expected behavior Heavy usage of websockets should not make them fail. My understanding is, that websockets are capable of this and I do not have to create some complicated setup with locks for a simple ping-pong style websocket. (If websockets weren't capable of handling this, it would mean they are a bad choice for chat software that has no chance predicting when and in what direction the next message travels).
I might just have implemented the WebSocket incorrectly. If that's the case, I hope the outcome of this issue is a good example of a correct WebSocketModule 😊
Screenshots none
Desktop (please complete the following information):
- OS: Windows 10
- Browser: Chrome, Firefox
- EmbedIO 3.4.3
Smartphone (please complete the following information):
- does not apply
Additional context I noticed this when attempting to convert a database-accessing api for a electron application, which uses a local EmbedIO server. I mentioned this in the EmbedIO Slack on 7th of january. This is the correspoding bug report.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think this needs investigation, IMO something is wrong with websockets
Hello @bdurrer, thanks for bearing with us. 🤦♂️ I have reopened and pinned this issue because it most definitely needs investigation. I hope I or @geoperez will be able to look into it soon,
Hello @rdeago Do you have any news regarding this topic? WebSockets could be in future a topic for me too.
While backporting #534 to version 3 I noticed that, unfortunately, @radioegor146's fix was incomplete, so I am reopening it.
The issue affects both WebSocket implementations
The first reason why the fix is incomplete is that the issue also affects HttpListenerMode.Microsoft. Quoting the "Remarks" section here (emphasis mine):
Exactly one send and one receive is supported on each WebSocket object in parallel.
This makes sense, as we obviously cannot write different data at the same time to the same socket. It also makes sense that there's no synchronization mechanism already in place for such an occurrence[^1], as it would incur an unnecessary (albeit small) performance hit in situations where a WebSocket is not read from or written to in more than one thread. Think for example a simple command / response protocol, no broadcasts involved: no point in synchronizing anything, as there will never be two or more simultaneous writes.
A close may imply a write
The second reason why we need a better fix, also affecting both HttpListenerModes, is more subtle.
When CloseAsync is called on a WebSocket, it does not "close" it at once: rather, it initiates the closing handshake[^2], which implies sending a "close" packet and, depending on the CloseStatusCode, waiting for one in response. The "sending" part obviously entails a write on the TCP socket, so it cannot execute at the same time as a SendAsync.
The solution (hopefully)
@radioegor146's fix is simple and almost gets the job done; I just have to add a wait on the same semaphore in CloseAsync.
In addition, the same synchronization logic shall be applied to SystemWebSocket.
Call for comments
@radioegor146 @geoperez @mariodivece @bdurrer as well as anyone else reading this; do you agree with the solution above? Did I overlook something?
[^1]: Actually, there is some synchronization, at least in the managed HttpWebSocket used on Mac and Linux systems, but I'm not at all sure it also applies to Windows. I think we'd better assume the quoted remark is there for a reason.
[^2]: That's if the closing handshake has not yet been started, either by receiving a "close" packet, or by a previous call to CloseAsync. In this, case, CloseAsync becomes a no-op.
Nice catch! I saw the fix and was like "doh, could have seen that myself" 😅 Does it matter when closing fails? Wouldn't it result in being (uncleanly) being closed anyway? I suggest waiting on close with a timeout, to prevent stuck threads for whatever reason.
Yep, definitely, we need synchronization in the CloseAsync call, as well as in .NET wrapper. Also I think that implementing some timeout mechanism in close will be good for security reasons, because if the client will not answer ACK to the server, then the socket can stuck in "closing".
@bdurrer and @radioegor146 you both make good points.
When CloseAsync fails, unless the cause is a bad parameter of course, IMHO the WebSocket should be aborted anyway. The rationale is that the user has expressed the will to not use the connection any longer.
For the same reason, we need to stop sending as soon as the parameters to CloseAsync are validated; I'm thinking of an additional WebSocketState constant (CloseRequested?), defined as "the state is not yet CloseSent, but CloseAsync has been called with valid parameters".
Let's see if I can summarize:
- there should be a thread-safe flag that tells whether
CloseAsynchas been called with valid parameters and closing has therefore commenced; -
SendAsyncshould check the flag and fail if it is set, the same way it now fails if the state is e.g.CloseSent; - received messages should be discarded when the flag is set, for consistence with the other
Close*states; - all of the above should apply to both
WebSocketandSystemWebSocket; - we should have an
EmbedIO.WebSockets.WebSocketStateenum with the same values asSystem.Net.WebSockets.WebSocketStateplus aCloseRequestedstate; - the
SystemWebSocket.Stategetter should returnCloseRequestedwhen the underlyingSystem.Net.WebSockets.WebSocket'sStateisOpenand the flag is set; - the internal
WebSocketclass should set its state toCloseRequestedwhen it raises the flag, i.e. after validatingCloseAsync's parameters.
To be honest, I suspect that the absence of the CloseRequested state in Microsoft's code is a bug. Not in the .NET runtime, but rather in the http.sys driver, whose functionality the managed HttpListener and related types are compelled to duplicate.
Or maybe I'm just overthinking and overengineering. 😄 Thoughts?
Hi, any news about this issue? I'm using the the last version of EmbedIO...