deno icon indicating copy to clipboard operation
deno copied to clipboard

bug: sending data to websocket after its closed randomly panics

Open DjDeveloperr opened this issue 3 years ago • 9 comments

Currently I'm sending payload to WebSocket wrapped in a try/catch because I want to ignore the errors as they are handled with reconnects in close event, but now Deno randomly panics sometimes with this error instead while no close event fires on JS side, like once a day.

thread 'main' panicked at 'Error 'WebSocket protocol error: Sending after closing is not allowed' contains boxed error of unknown type: Protocol(SendAfterClosing)', cli/errors.rs:35:7
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Backtrace: image

I cannot find a minimal reproduction. For context, I'm using WebSocket to interact with Discord Gateway.

deno v1.10.3 using deno docker (debian)

DjDeveloperr avatar Jun 06 '21 11:06 DjDeveloperr

cc @crowlKats

lucacasonato avatar Jun 07 '21 10:06 lucacasonato

I can reproduce this websocket panic with Deno 1.15.1.

I setup an echo server and a (reconnecting) client that pumps messages to the server. Right before the ws.send, I set a short timeout (3ms) and closed the connection. There appears to be a race condition that happens while the server is sending the response during which time the connection is closed.

Here is the backtrace with RUST_BACKTRACE=full.

thread 'main' panicked at 'Error 'Connection closed normally' contains boxed error of unknown type:
  ConnectionClosed', cli/errors.rs:73:7
stack backtrace:
   0:        0x1060d7f53 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8d168763f08a78b9
   1:        0x10563e2cc - core::fmt::write::h0fa46d4891ed63f1
   2:        0x1060d5e4a - std::io::Write::write_fmt::hacbf9e10e17149ce
   3:        0x1060d708f - std::panicking::default_hook::{{closure}}::hee7cf38dedfa76fc
   4:        0x1060d625d - std::panicking::default_hook::hd12252412832a45f
   5:        0x1059a6061 - deno::setup_exit_process_panic_hook::{{closure}}::h70933608affb3773
   6:        0x1060d5cf9 - std::panicking::rust_panic_with_hook::h521fd09e07057893
   7:        0x1060f43fa - std::panicking::begin_panic_handler::{{closure}}::h3588a5a9cca9e8e5
   8:        0x1060f4368 - std::sys_common::backtrace::__rust_end_short_backtrace::h86522c151a195ff9
   9:        0x1060f4310 - _rust_begin_unwind
  10:        0x10727716f - std::panicking::begin_panic_fmt::hf4ebaa1b90291b0b
  11:        0x10574fa2a - deno::errors::get_error_class_name::{{closure}}::h2ee2df1a6d33aa7e
  12:        0x1056a0641 - deno::errors::get_error_class_name::ha85d240f5e3c0e80
  13:        0x105bcf171 - deno_core::ops::serialize_op_result::hd11c6b5e9b89dd12
  14:        0x105d2d172 - <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll::hc0e8adb3f8c69ec2
  15:        0x105a354cb - futures_util::stream::stream::StreamExt::poll_next_unpin::h741bad71a4187364
  16:        0x105a3039d - deno_core::runtime::JsRuntime::poll_event_loop::h006bc5cbfab09c79
  17:        0x1056892f9 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h6f23e6a72a8d6e8d
  18:        0x105688fac - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf543fa3670f831a0
  19:        0x10577df7c - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hde39aa66f49436e3
  20:        0x10577bb53 - deno::run_command::{{closure}}::ha9495f086a3aef91
  21:        0x10577a8fd - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf58f70de9058b631
  22:        0x10565e30f - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h49583a8e490b5741
  23:        0x10564f383 - deno::main::h4d4e42da58ebceb5
  24:        0x10564b069 - std::sys_common::backtrace::__rust_begin_short_backtrace::h51ec75f4dce6bb35
  25:        0x10564b2c5 - _main

Here is the echo-server.ts:

//echo-server.ts
const listener = Deno.listen({ port: 5000 });
for await (const conn of listener) {
  handleNewConnection(conn);
}

async function handleNewConnection(conn: Deno.Conn) {
  const httpConn = Deno.serveHttp(conn);
  for await (const { request: req, respondWith: res } of httpConn) {
    const { socket: ws, response: resp } = Deno.upgradeWebSocket(req);
    ws.onmessage = (m) => {
      setTimeout(() => {
        try {
          ws.send(m.data);
        } catch (_e) {
          // ignore the errors
        }
      }, 0);
    };
    res(resp);
  }
}

And the echo-client.ts

let ws = new WebSocket("ws://localhost:5000");

setInterval(() => {
  const m = crypto.randomUUID();
  setTimeout(() => {
    ws.close();
  }, 3);
  try {
    ws.send(m);
  } catch (_e) {
    // ignore errors
  }
}, 5);

const showMessage = (m: any) => {
  console.log("<<<", m.data);
};

const reconnect = () => {
  ws.onmessage = null;
  ws.onerror = null;
  ws.onclose = null;
  ws = new WebSocket("ws://localhost:5000");
  ws.onmessage = showMessage;
  ws.onclose = reconnect;
};

ws.onmessage = showMessage;
ws.onclose = reconnect;

I can even get it to fail without the reconnecting part, but sometimes I have to start and stop the client a few times.

irbull avatar Oct 13 '21 20:10 irbull

Also, on occasion I get this backtrace. Similar, but not exactly the same:

thread 'main' panicked at 'Error 'IO error: Broken pipe (os error 32)' contains boxed error of unknown type:
  Io(Os { code: 32, kind: BrokenPipe, message: "Broken pipe" })
  Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', cli/errors.rs:73:7
stack backtrace:
   0:        0x104f82f53 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8d168763f08a78b9
   1:        0x1044e92cc - core::fmt::write::h0fa46d4891ed63f1
   2:        0x104f80e4a - std::io::Write::write_fmt::hacbf9e10e17149ce
   3:        0x104f8208f - std::panicking::default_hook::{{closure}}::hee7cf38dedfa76fc
   4:        0x104f8125d - std::panicking::default_hook::hd12252412832a45f
   5:        0x104851061 - deno::setup_exit_process_panic_hook::{{closure}}::h70933608affb3773
   6:        0x104f80cf9 - std::panicking::rust_panic_with_hook::h521fd09e07057893
   7:        0x104f9f3fa - std::panicking::begin_panic_handler::{{closure}}::h3588a5a9cca9e8e5
   8:        0x104f9f368 - std::sys_common::backtrace::__rust_end_short_backtrace::h86522c151a195ff9
   9:        0x104f9f310 - _rust_begin_unwind
  10:        0x10612216f - std::panicking::begin_panic_fmt::hf4ebaa1b90291b0b
  11:        0x1045faa2a - deno::errors::get_error_class_name::{{closure}}::h2ee2df1a6d33aa7e
  12:        0x10454b641 - deno::errors::get_error_class_name::ha85d240f5e3c0e80
  13:        0x104a7a171 - deno_core::ops::serialize_op_result::hd11c6b5e9b89dd12
  14:        0x104bd8172 - <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll::hc0e8adb3f8c69ec2
  15:        0x1048e04cb - futures_util::stream::stream::StreamExt::poll_next_unpin::h741bad71a4187364
  16:        0x1048db39d - deno_core::runtime::JsRuntime::poll_event_loop::h006bc5cbfab09c79
  17:        0x1045342f9 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h6f23e6a72a8d6e8d
  18:        0x104533fac - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf543fa3670f831a0
  19:        0x104628f7c - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hde39aa66f49436e3
  20:        0x104626b53 - deno::run_command::{{closure}}::ha9495f086a3aef91
  21:        0x1046258fd - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf58f70de9058b631
  22:        0x10450930f - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h49583a8e490b5741
  23:        0x1044fa383 - deno::main::h4d4e42da58ebceb5
  24:        0x1044f6069 - std::sys_common::backtrace::__rust_begin_short_backtrace::h51ec75f4dce6bb35
  25:        0x1044f62c5 - _main

irbull avatar Oct 13 '21 21:10 irbull

And if we flip it around, and the server sends the messages to the client, while setting a short timer to close the connection, then you get the boxed error that @DjDeveloperr got:

thread 'main' panicked at 'Error 'WebSocket protocol error: Sending after closing is not allowed' contains boxed error of unknown type:
  Protocol(SendAfterClosing)', cli/errors.rs:73:7
stack backtrace:
   0:        0x102951f53 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8d168763f08a78b9
   1:        0x101eb82cc - core::fmt::write::h0fa46d4891ed63f1
   2:        0x10294fe4a - std::io::Write::write_fmt::hacbf9e10e17149ce
   3:        0x10295108f - std::panicking::default_hook::{{closure}}::hee7cf38dedfa76fc
   4:        0x10295025d - std::panicking::default_hook::hd12252412832a45f
   5:        0x102220061 - deno::setup_exit_process_panic_hook::{{closure}}::h70933608affb3773
   6:        0x10294fcf9 - std::panicking::rust_panic_with_hook::h521fd09e07057893
   7:        0x10296e3fa - std::panicking::begin_panic_handler::{{closure}}::h3588a5a9cca9e8e5
   8:        0x10296e368 - std::sys_common::backtrace::__rust_end_short_backtrace::h86522c151a195ff9
   9:        0x10296e310 - _rust_begin_unwind
  10:        0x103af116f - std::panicking::begin_panic_fmt::hf4ebaa1b90291b0b
  11:        0x101fc9a2a - deno::errors::get_error_class_name::{{closure}}::h2ee2df1a6d33aa7e
  12:        0x101f1a641 - deno::errors::get_error_class_name::ha85d240f5e3c0e80
  13:        0x102449171 - deno_core::ops::serialize_op_result::hd11c6b5e9b89dd12
  14:        0x1025a7172 - <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll::hc0e8adb3f8c69ec2
  15:        0x1022af4cb - futures_util::stream::stream::StreamExt::poll_next_unpin::h741bad71a4187364
  16:        0x1022aa39d - deno_core::runtime::JsRuntime::poll_event_loop::h006bc5cbfab09c79
  17:        0x101f032f9 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h6f23e6a72a8d6e8d
  18:        0x101ff5c85 - deno::run_command::{{closure}}::ha9495f086a3aef91
  19:        0x101ff48fd - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf58f70de9058b631
  20:        0x101ed830f - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h49583a8e490b5741
  21:        0x101ec9383 - deno::main::h4d4e42da58ebceb5
  22:        0x101ec5069 - std::sys_common::backtrace::__rust_begin_short_backtrace::h51ec75f4dce6bb35
  23:        0x101ec52c5 - _main

irbull avatar Oct 14 '21 04:10 irbull

Thanks for these detailed reports! main blocker for fixing these was figuring out how to reproduce these. for the first problem "Connection closed normally", i have opened #12437. second one might be a bit more challenging. 3rd one is also going to be a bit difficult

crowlKats avatar Oct 14 '21 05:10 crowlKats

The third case Protocol(SendAfterClosing) is actually just the reverse. Client connects to the server, and the client echo's its responses, and the server sends sends requests followed by close.

// echo-client.ts
let ws = new WebSocket("ws://localhost:5000");

const showMessage = (m: any) => {
  setTimeout(() => {
    try {
      ws.send(m.data);
    } catch (_e) {
      //ignore the errors
    }
  }, 0);
};

const reconnect = async () => {
  console.log("reconnect");
  ws.onmessage = null;
  ws.onerror = null;
  ws.onclose = null;
  ws = new WebSocket("ws://localhost:5000");
  ws.onmessage = showMessage;
  ws.onclose = reconnect;
  await new Promise((resolve) => setTimeout(resolve, 100));
  ws.onopen = () => {
  };
};

ws.onmessage = showMessage;
ws.onclose = reconnect;
//echo-server.ts
const listener = Deno.listen({ port: 5000 });
let connectionCounter = 0;
for await (const conn of listener) {
  handleNewConnection(conn);
}

async function handleNewConnection(conn: Deno.Conn) {
  const httpConn = Deno.serveHttp(conn);
  for await (const { request: req, respondWith: res } of httpConn) {
    const { socket: ws, response: resp } = Deno.upgradeWebSocket(req);
    console.log("new connection: " + connectionCounter++);
    setInterval(() => {
      const m = crypto.randomUUID();
      setTimeout(() => {
        //closed = false;
        ws.close();
      }, 0);
      try {
        ws.send(m);
      } catch (_e) {
        // ignore errors
      }
    }, 2);
    ws.onmessage = (m) => {
      console.log("<<<< ", m.data);
    };
    try {
      await res(resp);
    } catch (e) {
      console.log(e);
    }
  }
}

@crowlKats I've pulled down your PR and figured out how to build deno. It seems to fix the ConnectionClosed case. I have zero knowledge about Rust (I just installed it using brew and learned that there is something called cargo), so don't trust too much of what I say ;). But I did extend your PR slightly, and in the matcher you added, I added a case for Error:Protocol. I'm not sure how to limit it to only Protocol(SendAfterClosing), but matching this error seems to fix this case.

    match res {
      Ok(()) => Ok(()),
      Err(Error::ConnectionClosed)=> Ok(()),
      Err(Error::Protocol { .. })=> Ok(()),
      Err(err) => Err(err.into()),
    }

irbull avatar Oct 14 '21 20:10 irbull

Same with the Io error. I added a matcher for that too. These seem like errors that can be thrown according to https://docs.rs/tungstenite/0.15.0/tungstenite/error/enum.Error.html, so specific handling may be required. After adding both of those matchers (Error::Protocol and Error:Io) I haven't been able to reproduce the Panic using these snippets.

irbull avatar Oct 14 '21 20:10 irbull

The problem with "send after closing" is that it shouldnt even be happening given the JS logic, but i guess there is a race condition with the async ops. we could ignore the error and call it a day, but not too keen on that

crowlKats avatar Oct 14 '21 23:10 crowlKats

When moving our backend from Nodejs to Deno, I was running into the Broken pipe (os error 32)' contains boxed error of unknown type panic about every 5 minutes on a production server with roughly 200 concurrent connections. The server hosts a game with matches taking somewhere between 5 and 20 minutes, so simply restarting the server with every panic wasn't an option, because then most matches wouldn't have a chance to finish. When a server crashes people just get kicked out of the game.

I was using deno compile to send a single binary to the server for it to run. But in the end what fixed it for me was to use deno bundle and then deno run that on the server instead, I haven't seen a single panic since this change. Though I'm worried that the issue is not actually fixed and that this only lowered the chance of it happening.

jespertheend avatar Sep 13 '22 11:09 jespertheend

This should be fixed with the switch from tungstenite to fastwebsockets and JS rewrite. Closing.

littledivy avatar Dec 02 '23 05:12 littledivy