rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

swarm/behaviour: add logging or metrics to structs that derive NetworkBehaviour

Open DieracDelta opened this issue 3 years ago • 1 comments

Description

It would be helpful to know which sub-behaviours open and close connections when deriving NetworkBehaviour. Suppose we have something like:

#[derive(NetworkBehaviour)]
#[behaviour(out_event = "NetworkEvent")]
pub struct NetworkDef {
    behaviour_0: Behaviour0,
    behaviour_1: Behaviour1,
    behaviour_2: Behaviour2,
   // other auxilary data structures
}

and I have a bug in Behaviour1 that causes it to initiate 100s of connections all at once. It's tricky to tell that Behaviour1 is the culprit. Having metrics or logging that says Behaviour_i opened n connections would be useful. Or some sort of externally queryable map saying sub behaviour i opened connections to these peers. I don't think this already exists but I could be wrong.

As for an initial implementation, yesterday (if I understood correctly) @mxinden suggested augmenting the poll generated by deriving the NetworkBehaviour with info level logging.

Motivation

I had a behaviour that was exploding connection numbers (and thereby file descriptors). To debug I manually commented out behaviours until the fds used stopped increasing. It would be easier to see if there was some sort of (trace level?) logging. For example:

behaviour 1: opened a connection to peer $PID_1. behaviour 1 now has opened 5 connections in total.
behaviour 1: closed a connection to peer $PID_2. behaviour 1 now has opened 4 connections in total.

Requirements

Open questions

Are you planning to do it yourself in a pull request?

Yes!

DieracDelta avatar Aug 24 '22 16:08 DieracDelta

Thanks for opening this up @DieracDelta!

As for an initial implementation, yesterday (if I understood correctly) @mxinden suggested augmenting the poll generated by deriving the NetworkBehaviour with info level logging.

Yes. While probably not the most sophisticated approach, I think this is a good place to get started. I would imagine something along the lines of:

modified   swarm-derive/src/lib.rs
@@ -516,6 +516,11 @@ fn build_struct(ast: &DeriveInput, data_struct: &DataStruct) -> TokenStream {
                 match #trait_to_impl::poll(&mut self.#field, cx, poll_params) {
                     #generate_event_match_arm
                     std::task::Poll::Ready(#network_behaviour_action::Dial { opts, handler: provided_handler }) => {
+                        log::debug!(
+                            "`NetworkBehaviour` \"{}\" emitted `NetworkBehaviourAction::Dial` to peer {}",
+                            #field, // Somehow convert this to a static string.
+                            opts.get_peer_id,
+                        );
                         return std::task::Poll::Ready(#network_behaviour_action::Dial { opts, handler: #provided_handler_and_new_handlers });
                     }
                     std::task::Poll::Ready(#network_behaviour_action::NotifyHandler { peer_id, handler, event }) => {

@DieracDelta what do you think? Would this suffice to debug issues like the one you faced above? Do you want to prepare a pull request for this?

mxinden avatar Aug 30 '22 06:08 mxinden

Let me know if this is still an issue.

thomaseizinger avatar Mar 29 '23 11:03 thomaseizinger