uWebSockets.js icon indicating copy to clipboard operation
uWebSockets.js copied to clipboard

res.cork(() => res.upgrade()) in Upgrade handler with ws.send() in Open handler = close connection

Open e3dio opened this issue 1 year ago • 9 comments

Doing as described in title can result in websocket dirty close connection, to prevent this can do setTimeout(() => ws.send('msg'), 0) if need to send message in Open handler after async Upgrade. I think calling res.upgrade() while corked calls the Open handler still corked and ws.send inside the upgrade cork can cause some tcp or protocol error. Removing res.cork() fixes connection error but displays warnings that cork is missing

e3dio avatar Oct 02 '24 04:10 e3dio

Do you have a full reproducer?

uNetworkingAB avatar Oct 03 '24 04:10 uNetworkingAB

Here is example it causes issue like 50% of the time, look for immediate socket close after open

import { App } from 'uWebSockets.js';

const port = 80;

App()

.get('/', r => r.end(`<script>
   ws = new WebSocket(location.origin+'/ws');
   ws.onopen = e => console.log('open');
   ws.onmessage = e => console.log('message', e.data);
   ws.onclose = e => console.log('close');
</script>`))

.ws('/ws', {
   upgrade: async (res, req, context) => {
      const wsHeaders = [ req.getHeader('sec-websocket-key'), req.getHeader('sec-websocket-protocol'), req.getHeader('sec-websocket-extensions') ];
      res.onAborted(() => res.aborted = true);
      await new Promise(r => setTimeout(r, 100));
      res.cork(() => res.upgrade({}, ...wsHeaders, context));
   },
   open: ws => console.log('open') || ws.send('w'.repeat(56)),
   message: ws => console.log('message'),
   close: ws => console.log('close'),
})

.listen(port, s => s && console.log(`http://localhost:${port}`));

e3dio avatar Oct 04 '24 21:10 e3dio

ws-bad2

e3dio avatar Oct 05 '24 20:10 e3dio

I believe I am observing a similar issue with my application. I was also able to reproduce it with the provided script. Here are my observations regarding the tests I made with the provided script.

The behavior is not consistent. Most of the time it works fine. To reproduce it I have to restart the script multiple times until the issue was observed (which happens around 1 time out of 5 to 10 tries). When it is observed, the issue occurs on all the requests (each time I refresh the page) repeatedly. However, I have observed that the server may revert back to the correct behavior after a bit of time (few minutes).

The server is closing the socket with the code 1006 and an empty message (also what I observe in my application).

  • I was able to reproduce the issue without sending anything in the open handler. So that does not seem to be a condition for the issue.
  • I was not able to reproduce the issue when removing the corking in the upgrade handler.
  • I was not able to reproduce the issue when removing the await instruction in the upgrade handler.

Here's a full packet capture of a single connection that observes the issue. We can see the server closing the connection right after the client acknowledged the first websocket packet:

Image

I tested using uWebSockets.js v20.51.0 (current latest).

Totalus avatar Mar 21 '25 16:03 Totalus

I tested with latest source build, confirmed issue with 1006 closed connection after async corked upgrade with ws.send in open handler. For me the error requires send in open. I do setTimeout(() => ws.send(), 100) to fix

e3dio avatar Mar 25 '25 01:03 e3dio

Had the same problem but in production only, with random disconnections on upgrade.

The problem seems to be with ws.send() either directly in open handler or asynchronously. So i think the problem is about the first call to ws.send().

I changed my code to allways cork every ws.send() call and it have fixed the problem. Maybe just corking the first call would be enough. But as the doc says: better be ridiculously corked.

RatchetS2 avatar Oct 11 '25 08:10 RatchetS2

Can you reproduce the issue with Valgrind?

valgrind nodejs server.js

uNetworkingAB avatar Oct 20 '25 02:10 uNetworkingAB

Experienced the same issue, my fix was removing async completely.. I kept getting 1006. (like 8 out of 10). This only happened in production, seems the delay introduced of the async operations, from dev to prod, is, enough to cause it. Now I just upgrade immediately, without checks, and validate and patch everything in the open handler.. :(

pixelzor avatar Dec 01 '25 16:12 pixelzor

ok something is broken here, will check

uNetworkingAB avatar Dec 01 '25 16:12 uNetworkingAB