rust-libp2p Is there any way to tell what PeerId has sent Inbound message in InboundUpgrade?

Some Context

I'm debugging a heisenbug which happens periodically: peer A sends message to peer B, but peer B is unable to read it due to unexpected end of file error.

And I'm having trouble reproducing and/or figuring out, which peer causes this. It would help me a lot if I could tell who has sent the message, i.e. PeerId of the sender.

I don't see how to get that info inside upgrade_inbound:

fn upgrade_inbound(self, mut socket: Socket, info: Self::Info)

Socket is multistream_select::negotiated::Negotiated<rw_stream_sink::RwStreamSink<Chan>>

The Question

Is there a way to tell sender PeerId inside InboundUpgrade implementation like this one upgrade.rs#L115?

Version

libp2p-core: 0.32 libp2p: 0.43

P.S.

Error happens inside read_length_prefixed. Exception looks like this

Error processing inbound ProtocolMessage: unexpected end of file

     Location:
         /home/circleci/project/particle-protocol/src/libp2p_protocol/upgrade.rs:126:30

     Stack backtrace:
        0: core::ops::function::Fn::call
        1: eyre::capture_handler
        2: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
        3: <libp2p_core::either::EitherFuture2<AFut,BFut> as core::future::future::Future>::poll
        4: futures_util::future::future::FutureExt::poll_unpin
        5: <libp2p_swarm::connection::handler_wrapper::SubstreamUpgrade<UserData,Upgrade> as core::future::future::Future>::poll
        6: futures_util::stream::stream::StreamExt::poll_next_unpin
        7: libp2p_swarm::connection::handler_wrapper::HandlerWrapper<TProtoHandler>::poll
        8: libp2p_swarm::connection::Connection<THandler>::poll
        9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
       10: futures_executor::thread_pool::PoolState::work
       11: std::sys_common::backtrace::__rust_begin_short_backtrace
       12: core::ops::function::FnOnce::call_once{{vtable.shim}}
       13: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
                  at ./rustc/ec4bcaac450279b029f3480b8b8f1b82ab36a5eb/library/alloc/src/boxed.rs:1854:9
       14: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
                  at ./rustc/ec4bcaac450279b029f3480b8b8f1b82ab36a5eb/library/alloc/src/boxed.rs:1854:9
       15: std::sys::unix::thread::Thread::new::thread_start
                  at ./rustc/ec4bcaac450279b029f3480b8b8f1b82ab36a5eb/library/std/src/sys/unix/thread.rs:108:17
       16: start_thread
       17: clone

Protocol implementation is here https://github.com/fluencelabs/fluence/blob/master/particle-protocol/src/libp2p_protocol/upgrade.rs#L126

Apr 02 '22 17:04 folex

InboundUpgrade has no way to tell who the sender is, given that the security handshake, and thus the authentication of the sender might not have happened yet.

The only way I see is logging the PeerId once the security handshake has been successful and then try to correlate the log line with the panic. Though I guess this is happening on a busy machine and thus hard to correlate, correct?

Apr 06 '22 16:04 mxinden

Yes, exactly. It's very hard to correlate.

Apr 13 '22 12:04 folex

If it's not possible to provide PeerId information there, maybe it makes sense provide address info? Like multiaddr.

That would allow me to understand who is on the other end of the socket.

If that makes sense, I can try to implement that.

Apr 13 '22 12:04 folex

If it's not possible to provide PeerId information there, maybe it makes sense provide address info? Like multiaddr.

Note that we are using the same InboundUpgrade and OutboundUpgrade across connection and stream upgrades. E.g. in the latter case we don't know the address, i.e. the address is not relevant. We could break the trait into two one for streams one for connections, though I am not sure that is worth it?

Apr 17 '22 08:04 mxinden

Well, in OutboundUpgrade case it seems relevant, too, and for similar reason.

Imagine a situation, where you're trying to communicate with some peer, and communication fails at Inbound/Outbound Upgrade level. How do you debug it in an open big network?

Let me try to give more specific arguments.

InboundUpgrade

In my case, some peers close connection before fully writing protocol message. That's InboundUpgrade.

And that's exactly what happened in my case. And I think I have a fix for that (as described here https://github.com/libp2p/rust-yamux/issues/117).

BUT since the network is rather inert when it comes to updates, some peers still follow invalid algorithm for sending data (ie closing too early). So my logs are still full of EOF errors, and it makes me uneasy, because I can't be sure I have fixed the problem.

If I could tell multiaddress/peerid/etc of the sender, I could go to that peer and ask for it's version via other protocol (eg identify), or at least tell if it's one of peers managed by me or not.

OutboundUpgrade

This would be a hypothetical example. Consider the following situation:

I send protocol message to some peer, but sending fails. It throws an EOF error for some peers, and messages do not get delivered.

If PeerId/multiaddress would be possible to learn, then I could pinpoint the peer at failure, and try to debug it somehow. Know its version, maybe talk to hoster of that peer, etc.

Conclusion :)

I see address information as vital for debugging networking problems. It is also feels natural to me to be able to access addresses of sockets, since OS has that information if I'm not mistaken.

WDYT?

Thanks for your openness and readiness to discuss possible changes in such a core API, really appreciate that. Cheers!

Apr 19 '22 20:04 folex

I am still a bit reserved to add an additional argument to OutboundInboundUpgrade for the sake of debugging.

Would it not be an option to return an error from OutboundUpgrade or InboundUpgrade and then log the error along with the peer information in ConnectionHandler::inject_dial_upgrade_error or inject_listen_upgrade_error or NetworkBehaviour::inject_dial_failure or inject_listen_failure?

I might be missing something.

Apr 23 '22 09:04 mxinden

Thank you, that sounds like what I originally wanted to know!

Apr 25 '22 10:04 folex