swarm/behaviour: add logging or metrics to structs that derive NetworkBehaviour
Description
It would be helpful to know which sub-behaviours open and close connections when deriving NetworkBehaviour. Suppose we have something like:
#[derive(NetworkBehaviour)]
#[behaviour(out_event = "NetworkEvent")]
pub struct NetworkDef {
behaviour_0: Behaviour0,
behaviour_1: Behaviour1,
behaviour_2: Behaviour2,
// other auxilary data structures
}
and I have a bug in Behaviour1 that causes it to initiate 100s of connections all at once. It's tricky to tell that Behaviour1 is the culprit. Having metrics or logging that says Behaviour_i opened n connections would be useful. Or some sort of externally queryable map saying sub behaviour i opened connections to these peers. I don't think this already exists but I could be wrong.
As for an initial implementation, yesterday (if I understood correctly) @mxinden suggested augmenting the poll generated by deriving the NetworkBehaviour with info level logging.
Motivation
I had a behaviour that was exploding connection numbers (and thereby file descriptors). To debug I manually commented out behaviours until the fds used stopped increasing. It would be easier to see if there was some sort of (trace level?) logging. For example:
behaviour 1: opened a connection to peer $PID_1. behaviour 1 now has opened 5 connections in total.
behaviour 1: closed a connection to peer $PID_2. behaviour 1 now has opened 4 connections in total.
Requirements
Open questions
Are you planning to do it yourself in a pull request?
Yes!
Thanks for opening this up @DieracDelta!
As for an initial implementation, yesterday (if I understood correctly) @mxinden suggested augmenting the
pollgenerated by deriving theNetworkBehaviourwithinfolevel logging.
Yes. While probably not the most sophisticated approach, I think this is a good place to get started. I would imagine something along the lines of:
modified swarm-derive/src/lib.rs
@@ -516,6 +516,11 @@ fn build_struct(ast: &DeriveInput, data_struct: &DataStruct) -> TokenStream {
match #trait_to_impl::poll(&mut self.#field, cx, poll_params) {
#generate_event_match_arm
std::task::Poll::Ready(#network_behaviour_action::Dial { opts, handler: provided_handler }) => {
+ log::debug!(
+ "`NetworkBehaviour` \"{}\" emitted `NetworkBehaviourAction::Dial` to peer {}",
+ #field, // Somehow convert this to a static string.
+ opts.get_peer_id,
+ );
return std::task::Poll::Ready(#network_behaviour_action::Dial { opts, handler: #provided_handler_and_new_handlers });
}
std::task::Poll::Ready(#network_behaviour_action::NotifyHandler { peer_id, handler, event }) => {
@DieracDelta what do you think? Would this suffice to debug issues like the one you faced above? Do you want to prepare a pull request for this?
Let me know if this is still an issue.