deno
deno copied to clipboard
bug: sending data to websocket after its closed randomly panics
Currently I'm sending payload to WebSocket wrapped in a try/catch because I want to ignore the errors as they are handled with reconnects in close event, but now Deno randomly panics sometimes with this error instead while no close event fires on JS side, like once a day.
thread 'main' panicked at 'Error 'WebSocket protocol error: Sending after closing is not allowed' contains boxed error of unknown type: Protocol(SendAfterClosing)', cli/errors.rs:35:7
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Backtrace:
I cannot find a minimal reproduction. For context, I'm using WebSocket to interact with Discord Gateway.
deno v1.10.3 using deno docker (debian)
cc @crowlKats
I can reproduce this websocket panic with Deno 1.15.1.
I setup an echo server and a (reconnecting) client that pumps messages to the server. Right before the ws.send
, I set a short timeout (3ms) and closed the connection. There appears to be a race condition that happens while the server is sending the response during which time the connection is closed.
Here is the backtrace with RUST_BACKTRACE=full
.
thread 'main' panicked at 'Error 'Connection closed normally' contains boxed error of unknown type:
ConnectionClosed', cli/errors.rs:73:7
stack backtrace:
0: 0x1060d7f53 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8d168763f08a78b9
1: 0x10563e2cc - core::fmt::write::h0fa46d4891ed63f1
2: 0x1060d5e4a - std::io::Write::write_fmt::hacbf9e10e17149ce
3: 0x1060d708f - std::panicking::default_hook::{{closure}}::hee7cf38dedfa76fc
4: 0x1060d625d - std::panicking::default_hook::hd12252412832a45f
5: 0x1059a6061 - deno::setup_exit_process_panic_hook::{{closure}}::h70933608affb3773
6: 0x1060d5cf9 - std::panicking::rust_panic_with_hook::h521fd09e07057893
7: 0x1060f43fa - std::panicking::begin_panic_handler::{{closure}}::h3588a5a9cca9e8e5
8: 0x1060f4368 - std::sys_common::backtrace::__rust_end_short_backtrace::h86522c151a195ff9
9: 0x1060f4310 - _rust_begin_unwind
10: 0x10727716f - std::panicking::begin_panic_fmt::hf4ebaa1b90291b0b
11: 0x10574fa2a - deno::errors::get_error_class_name::{{closure}}::h2ee2df1a6d33aa7e
12: 0x1056a0641 - deno::errors::get_error_class_name::ha85d240f5e3c0e80
13: 0x105bcf171 - deno_core::ops::serialize_op_result::hd11c6b5e9b89dd12
14: 0x105d2d172 - <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll::hc0e8adb3f8c69ec2
15: 0x105a354cb - futures_util::stream::stream::StreamExt::poll_next_unpin::h741bad71a4187364
16: 0x105a3039d - deno_core::runtime::JsRuntime::poll_event_loop::h006bc5cbfab09c79
17: 0x1056892f9 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h6f23e6a72a8d6e8d
18: 0x105688fac - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf543fa3670f831a0
19: 0x10577df7c - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hde39aa66f49436e3
20: 0x10577bb53 - deno::run_command::{{closure}}::ha9495f086a3aef91
21: 0x10577a8fd - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf58f70de9058b631
22: 0x10565e30f - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h49583a8e490b5741
23: 0x10564f383 - deno::main::h4d4e42da58ebceb5
24: 0x10564b069 - std::sys_common::backtrace::__rust_begin_short_backtrace::h51ec75f4dce6bb35
25: 0x10564b2c5 - _main
Here is the echo-server.ts
:
//echo-server.ts
const listener = Deno.listen({ port: 5000 });
for await (const conn of listener) {
handleNewConnection(conn);
}
async function handleNewConnection(conn: Deno.Conn) {
const httpConn = Deno.serveHttp(conn);
for await (const { request: req, respondWith: res } of httpConn) {
const { socket: ws, response: resp } = Deno.upgradeWebSocket(req);
ws.onmessage = (m) => {
setTimeout(() => {
try {
ws.send(m.data);
} catch (_e) {
// ignore the errors
}
}, 0);
};
res(resp);
}
}
And the echo-client.ts
let ws = new WebSocket("ws://localhost:5000");
setInterval(() => {
const m = crypto.randomUUID();
setTimeout(() => {
ws.close();
}, 3);
try {
ws.send(m);
} catch (_e) {
// ignore errors
}
}, 5);
const showMessage = (m: any) => {
console.log("<<<", m.data);
};
const reconnect = () => {
ws.onmessage = null;
ws.onerror = null;
ws.onclose = null;
ws = new WebSocket("ws://localhost:5000");
ws.onmessage = showMessage;
ws.onclose = reconnect;
};
ws.onmessage = showMessage;
ws.onclose = reconnect;
I can even get it to fail without the reconnecting part, but sometimes I have to start and stop the client a few times.
Also, on occasion I get this backtrace. Similar, but not exactly the same:
thread 'main' panicked at 'Error 'IO error: Broken pipe (os error 32)' contains boxed error of unknown type:
Io(Os { code: 32, kind: BrokenPipe, message: "Broken pipe" })
Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', cli/errors.rs:73:7
stack backtrace:
0: 0x104f82f53 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8d168763f08a78b9
1: 0x1044e92cc - core::fmt::write::h0fa46d4891ed63f1
2: 0x104f80e4a - std::io::Write::write_fmt::hacbf9e10e17149ce
3: 0x104f8208f - std::panicking::default_hook::{{closure}}::hee7cf38dedfa76fc
4: 0x104f8125d - std::panicking::default_hook::hd12252412832a45f
5: 0x104851061 - deno::setup_exit_process_panic_hook::{{closure}}::h70933608affb3773
6: 0x104f80cf9 - std::panicking::rust_panic_with_hook::h521fd09e07057893
7: 0x104f9f3fa - std::panicking::begin_panic_handler::{{closure}}::h3588a5a9cca9e8e5
8: 0x104f9f368 - std::sys_common::backtrace::__rust_end_short_backtrace::h86522c151a195ff9
9: 0x104f9f310 - _rust_begin_unwind
10: 0x10612216f - std::panicking::begin_panic_fmt::hf4ebaa1b90291b0b
11: 0x1045faa2a - deno::errors::get_error_class_name::{{closure}}::h2ee2df1a6d33aa7e
12: 0x10454b641 - deno::errors::get_error_class_name::ha85d240f5e3c0e80
13: 0x104a7a171 - deno_core::ops::serialize_op_result::hd11c6b5e9b89dd12
14: 0x104bd8172 - <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll::hc0e8adb3f8c69ec2
15: 0x1048e04cb - futures_util::stream::stream::StreamExt::poll_next_unpin::h741bad71a4187364
16: 0x1048db39d - deno_core::runtime::JsRuntime::poll_event_loop::h006bc5cbfab09c79
17: 0x1045342f9 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h6f23e6a72a8d6e8d
18: 0x104533fac - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf543fa3670f831a0
19: 0x104628f7c - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hde39aa66f49436e3
20: 0x104626b53 - deno::run_command::{{closure}}::ha9495f086a3aef91
21: 0x1046258fd - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf58f70de9058b631
22: 0x10450930f - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h49583a8e490b5741
23: 0x1044fa383 - deno::main::h4d4e42da58ebceb5
24: 0x1044f6069 - std::sys_common::backtrace::__rust_begin_short_backtrace::h51ec75f4dce6bb35
25: 0x1044f62c5 - _main
And if we flip it around, and the server sends the messages to the client, while setting a short timer to close the connection, then you get the boxed error that @DjDeveloperr got:
thread 'main' panicked at 'Error 'WebSocket protocol error: Sending after closing is not allowed' contains boxed error of unknown type:
Protocol(SendAfterClosing)', cli/errors.rs:73:7
stack backtrace:
0: 0x102951f53 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8d168763f08a78b9
1: 0x101eb82cc - core::fmt::write::h0fa46d4891ed63f1
2: 0x10294fe4a - std::io::Write::write_fmt::hacbf9e10e17149ce
3: 0x10295108f - std::panicking::default_hook::{{closure}}::hee7cf38dedfa76fc
4: 0x10295025d - std::panicking::default_hook::hd12252412832a45f
5: 0x102220061 - deno::setup_exit_process_panic_hook::{{closure}}::h70933608affb3773
6: 0x10294fcf9 - std::panicking::rust_panic_with_hook::h521fd09e07057893
7: 0x10296e3fa - std::panicking::begin_panic_handler::{{closure}}::h3588a5a9cca9e8e5
8: 0x10296e368 - std::sys_common::backtrace::__rust_end_short_backtrace::h86522c151a195ff9
9: 0x10296e310 - _rust_begin_unwind
10: 0x103af116f - std::panicking::begin_panic_fmt::hf4ebaa1b90291b0b
11: 0x101fc9a2a - deno::errors::get_error_class_name::{{closure}}::h2ee2df1a6d33aa7e
12: 0x101f1a641 - deno::errors::get_error_class_name::ha85d240f5e3c0e80
13: 0x102449171 - deno_core::ops::serialize_op_result::hd11c6b5e9b89dd12
14: 0x1025a7172 - <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll::hc0e8adb3f8c69ec2
15: 0x1022af4cb - futures_util::stream::stream::StreamExt::poll_next_unpin::h741bad71a4187364
16: 0x1022aa39d - deno_core::runtime::JsRuntime::poll_event_loop::h006bc5cbfab09c79
17: 0x101f032f9 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h6f23e6a72a8d6e8d
18: 0x101ff5c85 - deno::run_command::{{closure}}::ha9495f086a3aef91
19: 0x101ff48fd - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf58f70de9058b631
20: 0x101ed830f - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h49583a8e490b5741
21: 0x101ec9383 - deno::main::h4d4e42da58ebceb5
22: 0x101ec5069 - std::sys_common::backtrace::__rust_begin_short_backtrace::h51ec75f4dce6bb35
23: 0x101ec52c5 - _main
Thanks for these detailed reports! main blocker for fixing these was figuring out how to reproduce these. for the first problem "Connection closed normally", i have opened #12437. second one might be a bit more challenging. 3rd one is also going to be a bit difficult
The third case Protocol(SendAfterClosing)
is actually just the reverse. Client connects to the server, and the client echo's its responses, and the server sends sends requests followed by close
.
// echo-client.ts
let ws = new WebSocket("ws://localhost:5000");
const showMessage = (m: any) => {
setTimeout(() => {
try {
ws.send(m.data);
} catch (_e) {
//ignore the errors
}
}, 0);
};
const reconnect = async () => {
console.log("reconnect");
ws.onmessage = null;
ws.onerror = null;
ws.onclose = null;
ws = new WebSocket("ws://localhost:5000");
ws.onmessage = showMessage;
ws.onclose = reconnect;
await new Promise((resolve) => setTimeout(resolve, 100));
ws.onopen = () => {
};
};
ws.onmessage = showMessage;
ws.onclose = reconnect;
//echo-server.ts
const listener = Deno.listen({ port: 5000 });
let connectionCounter = 0;
for await (const conn of listener) {
handleNewConnection(conn);
}
async function handleNewConnection(conn: Deno.Conn) {
const httpConn = Deno.serveHttp(conn);
for await (const { request: req, respondWith: res } of httpConn) {
const { socket: ws, response: resp } = Deno.upgradeWebSocket(req);
console.log("new connection: " + connectionCounter++);
setInterval(() => {
const m = crypto.randomUUID();
setTimeout(() => {
//closed = false;
ws.close();
}, 0);
try {
ws.send(m);
} catch (_e) {
// ignore errors
}
}, 2);
ws.onmessage = (m) => {
console.log("<<<< ", m.data);
};
try {
await res(resp);
} catch (e) {
console.log(e);
}
}
}
@crowlKats I've pulled down your PR and figured out how to build deno
. It seems to fix the ConnectionClosed
case. I have zero knowledge about Rust (I just installed it using brew and learned that there is something called cargo
), so don't trust too much of what I say ;). But I did extend your PR slightly, and in the matcher you added, I added a case for Error:Protocol
. I'm not sure how to limit it to only Protocol(SendAfterClosing)
, but matching this error seems to fix this case.
match res {
Ok(()) => Ok(()),
Err(Error::ConnectionClosed)=> Ok(()),
Err(Error::Protocol { .. })=> Ok(()),
Err(err) => Err(err.into()),
}
Same with the Io
error. I added a matcher for that too. These seem like errors that can be thrown according to https://docs.rs/tungstenite/0.15.0/tungstenite/error/enum.Error.html, so specific handling may be required. After adding both of those matchers (Error::Protocol
and Error:Io
) I haven't been able to reproduce the Panic using these snippets.
The problem with "send after closing" is that it shouldnt even be happening given the JS logic, but i guess there is a race condition with the async ops. we could ignore the error and call it a day, but not too keen on that
When moving our backend from Nodejs to Deno, I was running into the Broken pipe (os error 32)' contains boxed error of unknown type
panic about every 5 minutes on a production server with roughly 200 concurrent connections.
The server hosts a game with matches taking somewhere between 5 and 20 minutes, so simply restarting the server with every panic wasn't an option, because then most matches wouldn't have a chance to finish. When a server crashes people just get kicked out of the game.
I was using deno compile
to send a single binary to the server for it to run. But in the end what fixed it for me was to use deno bundle
and then deno run
that on the server instead, I haven't seen a single panic since this change.
Though I'm worried that the issue is not actually fixed and that this only lowered the chance of it happening.
This should be fixed with the switch from tungstenite to fastwebsockets and JS rewrite. Closing.