Is there any way to tell what PeerId has sent Inbound message in InboundUpgrade?
Some Context
I'm debugging a heisenbug which happens periodically: peer A sends message to peer B, but peer B is unable to read it due to unexpected end of file error.
And I'm having trouble reproducing and/or figuring out, which peer causes this. It would help me a lot if I could tell who has sent the message, i.e. PeerId of the sender.
I don't see how to get that info inside upgrade_inbound:
fn upgrade_inbound(self, mut socket: Socket, info: Self::Info)
Socket is multistream_select::negotiated::Negotiated<rw_stream_sink::RwStreamSink<Chan>>
The Question
Is there a way to tell sender PeerId inside InboundUpgrade implementation like this one upgrade.rs#L115?
Version
libp2p-core: 0.32 libp2p: 0.43
P.S.
Error happens inside read_length_prefixed. Exception looks like this
Error processing inbound ProtocolMessage: unexpected end of file
Location:
/home/circleci/project/particle-protocol/src/libp2p_protocol/upgrade.rs:126:30
Stack backtrace:
0: core::ops::function::Fn::call
1: eyre::capture_handler
2: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
3: <libp2p_core::either::EitherFuture2<AFut,BFut> as core::future::future::Future>::poll
4: futures_util::future::future::FutureExt::poll_unpin
5: <libp2p_swarm::connection::handler_wrapper::SubstreamUpgrade<UserData,Upgrade> as core::future::future::Future>::poll
6: futures_util::stream::stream::StreamExt::poll_next_unpin
7: libp2p_swarm::connection::handler_wrapper::HandlerWrapper<TProtoHandler>::poll
8: libp2p_swarm::connection::Connection<THandler>::poll
9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
10: futures_executor::thread_pool::PoolState::work
11: std::sys_common::backtrace::__rust_begin_short_backtrace
12: core::ops::function::FnOnce::call_once{{vtable.shim}}
13: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at ./rustc/ec4bcaac450279b029f3480b8b8f1b82ab36a5eb/library/alloc/src/boxed.rs:1854:9
14: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at ./rustc/ec4bcaac450279b029f3480b8b8f1b82ab36a5eb/library/alloc/src/boxed.rs:1854:9
15: std::sys::unix::thread::Thread::new::thread_start
at ./rustc/ec4bcaac450279b029f3480b8b8f1b82ab36a5eb/library/std/src/sys/unix/thread.rs:108:17
16: start_thread
17: clone
Protocol implementation is here https://github.com/fluencelabs/fluence/blob/master/particle-protocol/src/libp2p_protocol/upgrade.rs#L126
InboundUpgrade has no way to tell who the sender is, given that the security handshake, and thus the authentication of the sender might not have happened yet.
The only way I see is logging the PeerId once the security handshake has been successful and then try to correlate the log line with the panic. Though I guess this is happening on a busy machine and thus hard to correlate, correct?
Yes, exactly. It's very hard to correlate.
If it's not possible to provide PeerId information there, maybe it makes sense provide address info? Like multiaddr.
That would allow me to understand who is on the other end of the socket.
If that makes sense, I can try to implement that.
If it's not possible to provide
PeerIdinformation there, maybe it makes sense provide address info? Like multiaddr.
Note that we are using the same InboundUpgrade and OutboundUpgrade across connection and stream upgrades. E.g. in the latter case we don't know the address, i.e. the address is not relevant. We could break the trait into two one for streams one for connections, though I am not sure that is worth it?
Well, in OutboundUpgrade case it seems relevant, too, and for similar reason.
Imagine a situation, where you're trying to communicate with some peer, and communication fails at Inbound/Outbound Upgrade level. How do you debug it in an open big network?
Let me try to give more specific arguments.
InboundUpgrade
In my case, some peers close connection before fully writing protocol message. That's InboundUpgrade.
And that's exactly what happened in my case. And I think I have a fix for that (as described here https://github.com/libp2p/rust-yamux/issues/117).
BUT since the network is rather inert when it comes to updates, some peers still follow invalid algorithm for sending data (ie closing too early). So my logs are still full of EOF errors, and it makes me uneasy, because I can't be sure I have fixed the problem.
If I could tell multiaddress/peerid/etc of the sender, I could go to that peer and ask for it's version via other protocol (eg identify), or at least tell if it's one of peers managed by me or not.
OutboundUpgrade
This would be a hypothetical example. Consider the following situation:
I send protocol message to some peer, but sending fails. It throws an EOF error for some peers, and messages do not get delivered.
If PeerId/multiaddress would be possible to learn, then I could pinpoint the peer at failure, and try to debug it somehow. Know its version, maybe talk to hoster of that peer, etc.
Conclusion :)
I see address information as vital for debugging networking problems. It is also feels natural to me to be able to access addresses of sockets, since OS has that information if I'm not mistaken.
WDYT?
Thanks for your openness and readiness to discuss possible changes in such a core API, really appreciate that. Cheers!
I am still a bit reserved to add an additional argument to OutboundInboundUpgrade for the sake of debugging.
Would it not be an option to return an error from OutboundUpgrade or InboundUpgrade and then log the error along with the peer information in ConnectionHandler::inject_dial_upgrade_error or inject_listen_upgrade_error or NetworkBehaviour::inject_dial_failure or inject_listen_failure?
I might be missing something.
Thank you, that sounds like what I originally wanted to know!