neqo icon indicating copy to clipboard operation
neqo copied to clipboard

Crash in [@ neqo_transport::connection::Connection::build_packet_header]

Open KershawChang opened this issue 1 year ago • 1 comments

See https://bugzilla.mozilla.org/show_bug.cgi?id=1829989.

The crash reason is called Option::unwrap()on aNone value. Crash stack:

8 libxul.so core::panicking::panic library/core/src/panicking.rs:114 cfi
9 libxul.so neqo_transport::connection::Connection::build_packet_header third_party/rust/neqo-transport/src/packet/mod.rs:0 cfi
10 libxul.so neqo_transport::connection::Connection::output_close third_party/rust/neqo-transport/src/connection/mod.rs:1850 inlined
10 libxul.so neqo_transport::connection::Connection::output third_party/rust/neqo-transport/src/connection/mod.rs:1760 inlined
10 libxul.so neqo_transport::connection::Connection::process_output third_party/rust/neqo-transport/src/connection/mod.rs:1004 cfi
11 libxul.so neqo_http3::connection_client::Http3Client::process_output third_party/rust/neqo-http3/src/connection_client.rs:850

It's not clear to me where unwrap is called inside function build_packet_header. Maybe @martinthomson knows?

KershawChang avatar Apr 26 '23 07:04 KershawChang

The likely source of the unwrap call is in Path::remote_cid or Path::local_cid. If we don't have a connection ID for the path, we could be in a position where this panics. Those functions will likely be inlined.

The remote connection ID is set when a path becomes permanent. Temporary paths is a concept we use for incoming packets on new paths. We should not be sending CONNECTION_CLOSE on a temporary path, so this is not likely to be the cause of the problem.

The local connection ID is set only for handshake packets. That carries a risk. If we are in a position where we don't have a local connection ID on the path, but we have to send CONNECTION_CLOSE in a long header, we might hit this path. I would look at calls to ensure_error_path to see if we are creating a path without a local connection ID from a short-header packet.

A simple test you might write to evaluate that is one where you haven't completed the handshake, but then a packet containing a garbage frame comes in on a different path. That packet would have a short header (or not, check both). If that crashes, then my theory might be solid.

That's what I'd do to understand the problem, which is always the first thing to try. We might get a better and more comprehensive fix if we do that. A simple patch to prevent the panic might be to check in output_close, whether the path has a local connection ID, before trying to send a close for any PacketNumberSpace that is not PacketNumberSpace::ApplicationData.

martinthomson avatar Apr 26 '23 23:04 martinthomson