rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

`unreachable` panic libp2p-swarm-0.44.2/src/handler/either.rs:106

Open joshuef opened this issue 1 year ago • 2 comments

Summary

Seemingly the unreachable code on a match in the swarm handler is panicking?!

This does not actually crash our app (I assume it's a spawned thread).

But it's not clear to me how this is even possible :|

Expected behavior

no panic

Actual behavior

panic

Relevant log output

May 20 07:07:36 node-1 safenode[23138]: The application panicked (crashed).
May 20 07:07:36 node-1 safenode[23138]: Message:  internal error: entered unreachable code
May 20 07:07:36 node-1 safenode[23138]: Location: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libp2p-swarm-0.44.2/src/handler/either.rs:106
May 20 07:07:36 node-1 safenode[23138]: Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
May 20 07:07:36 node-1 safenode[23138]: Run with RUST_BACKTRACE=full to include source snippets.

root@node-1:/# journalctl -u safenode7.service 
May 21 16:13:20 node-1 safenode[23146]: The application panicked (crashed).
May 21 16:13:20 node-1 safenode[23146]: Message:  internal error: entered unreachable code
May 21 16:13:20 node-1 safenode[23146]: Location: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libp2p-swarm-0.44.2/src/handler/either.rs:106

Possible Solution

No response

Version

No response

Would you like to work on fixing this bug ?

Maybe

joshuef avatar May 22 '24 01:05 joshuef

Thank you for the report. It looks like it happens when receiving an event from the behaviour.

https://github.com/libp2p/rust-libp2p/blob/ad7ad5b3fc5b4bc9a431ece90e9a5ce8c33ca0e2/swarm/src/handler/either.rs#L102-L108

Could you provide any code to reproduce it and any possible logs?

dariusc93 avatar May 22 '24 02:05 dariusc93

it seems we had two relay events around the panic time:

[2024-05-20T07:07:36.488761Z INFO sn_networking::event::swarm] relay client event event=OutboundCircuitEstablished { relay_peer_id: PeerId("12D3KooWPQYu4GjLEEwxp1BmgneDDwajawjUhT9ZupTbp6LRT4Ra"), limit: None }```


```May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]: Location: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libp2p-swarm-0.44.2/src/handler/either.rs:106
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:                                 ⋮ 7 frames hidden ⋮                               
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:    8: core::panicking::panic::h584a128616e7fa9b
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:       at <unknown source file>:<unknown line>
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:    9: <libp2p_swarm::handler::select::ConnectionHandlerSelect<TProto1,TProto2> as libp2p_swarm::handler::ConnectionHandler>::on_behaviour_event::h75b996a885f09be9
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:       at <unknown source file>:<unknown line>
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:   10: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h0adc49c52e11e997
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:       at <unknown source file>:<unknown line>
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:   11: tokio::runtime::task::core::Core<T,S>::poll::h96755579c44ce1d6
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:       at <unknown source file>:<unknown line>
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:   12: tokio::runtime::task::raw::poll::h5bedef3875a2cdb8
May 22 20:41:19 ShuNetV3-node-4 safenode[3790806]:       at <unknown source file>:<unknown line>

trying to gather more logs (it's not super frequent, but we are seeing it often ish, we've restarted nodes with debugger attached just now)

joshuef avatar May 22 '24 22:05 joshuef

Is there any more information on this panic, we are also seeing in the wild, but whats causing it is evading us completely at the moment. Its happening every now and then, and this is still present on 0.45.1

2024-12-09T14:23:17+02:00 INFO  tari::p2pool::server::p2p Connection established: PeerId("12D3KooWDSBRafT4bPq5zbSod1xrtQ27BYXymBTovVBD4VsvioQC") -> Dialer { address: /ip4/152.228.210.16/tcp/19001/p2p/12D3KooWD6GY3c8cz6AwKaDaqmqGCbmewhjKT5ULN9JUB5oUgWjS/p2p-circuit/p2p/12D3KooWDSBRafT4bPq5zbSod1xrtQ27BYXymBTovVBD4VsvioQC, role_override: Dialer, port_use: Reuse } (1/Some([])/7.6865053s)
2024-12-09T14:23:17+02:00 ERROR tari::p2pool::main Panic occurred at 2024-12-09 12:23:17 file: '\.cargo\git\checkouts\rust-libp2p-cb9e6d071e393592\3d918cc\swarm\src\handler\either.rs', line: 106: internal error: entered unreachable code
2024-12-09T14:23:17+02:00 WARN  tari::p2pool::server::p2p Outgoing connection error: PeerId("12D3KooWGXQ8QEPYxmAAbREUSeFKgMYesUKA5923tQKXZcSg7oJz") -> Transport([(/ip4/54.37.130.231/tcp/19001/p2p/12D3KooWGXQ8QEPYxmAAbREUSeFKgMYesUKA5923tQKXZcSg7oJz, Other(Custom { kind: Other, error: Other(Right(Left(Left(Right(Apply(Io(Os { code: 10093, kind: Uncategorized, message: "Either the application has not called WSAStartup, or WSAStartup failed.error: process didn't exit successfully: `D:\projects\sha-p2pool\target\release\sha_p2pool.exe start -b . --squad default_2 --stable-peer --mdns-disabled --randomx-disabled --debug-print-chain` (exit code: 500)
Panic at 2024-12-07 08:47:09 file: '/.cargo/git/checkouts/rust-libp2p-cb9e6d071e393592/3d918cc/swarm/src/handler/either.rs', line: 106: internal error: entered unreachable code
2024-12-07T10:47:09+02:00 INFO  tari::p2pool::server::p2p Connection established: PeerId("12D3KooWGjKRrhLd6sUUGLepJmh3cu9weQziUW2V6WrFmD14c3X2") -> Dialer { address: /ip4/54.37.130.231/tcp/19001/p2p/12D3KooWGXQ8QEPYxmAAbREUSeFKgMYesUKA5923tQKXZcSg7oJz/p2p-circuit/p2p/12D3KooWGjKRrhLd6sUUGLepJmh3cu9weQziUW2V6WrFmD14c3X2, role_override: Dialer, port_use: Reuse } (1/Some([(/ip4/51.79.103.128/tcp/19001/p2p/12D3KooWLy2xhRfV5eh6k95LJ4T1Uv7vRhdHvL69mnHdB9xJqhFS/p2p-circuit/p2p/12D3KooWGjKRrhLd6sUUGLepJmh3cu9weQziUW2V6WrFmD14c3X2, Other(Custom { kind: Other, error: Other(Left(Left(Left(Connect(NoReservation))))) }))])/3.594297084s)

SWvheerden avatar Dec 11 '24 06:12 SWvheerden

Could you share your NetworkBehavior setup? I.e. which behaviors are you using and in which order? And specifically: which two behaviors are you using in the Either behavior?

And do you ever toggle between them during runtime?

elenaf9 avatar Dec 11 '24 08:12 elenaf9

are you talking about this: https://github.com/tari-project/sha-p2pool/blob/4b96f8bb547ebc650390eee6fcbcde66c2ffa114/src/server/p2p/setup.rs#L146

SWvheerden avatar Dec 11 '24 09:12 SWvheerden

I just noticed that the relay protocol internally uses Either as connection handler, and the relay protocol also seems to be used by both of you. My guess is that there is an issue somewhere in the relay network behavior implementation, where the behavior notifies a wrong connection handler, thus causing the panic. I am still trying to figure out where exactly this happens.

Edit: well, could also be dcutr.

elenaf9 avatar Dec 11 '24 10:12 elenaf9

this was addressed on https://github.com/libp2p/rust-libp2p/pull/5765 and can be closed right @elenaf9?

jxs avatar Jan 28 '25 12:01 jxs

Yes. As I already suspected it was a problem in the relay network behavior, that is fixed with #5765. @joshuef please re-open if the issue persists.


That said, I think we should also reconsider panicking in the impl ConnectionHandler for Either in the first place. The unreachable were placed there under the assumption that the Either as ConnectionHandler is used together with Either as NetworkBehavior. Here, the statements are actually unreachable. However, once it is used by other protocols like dcutr and relay, this is not guaranteed, since a wrongly forwarded event can trigger the unreachable. Wdyt of instead dropping events that are forwarded to the wrong handler, with a warning?

elenaf9 avatar Jan 28 '25 14:01 elenaf9