go-libp2p
go-libp2p copied to clipboard
Get statistics on transport used for incoming connection by agent version
It would be useful to see the transport breakdown for incomming connections by agent version so that we could see:
- What percent of newer nodes are dialing via QUIC?
- Does a change to dialing negatively or positively impact to prevalence of QUIC? (smart dialing logic)
- Are other implementations increasing QUIC prevalence?
I wonder if this is something the folks at probe-lab could help with since it's information about the whole libp2p network and not just go-libp2p
cc @yiannisbot @dennis-tra for possible probelab help.
is "nodes are dialing via QUIC" == "nodes listening on UDP/QUIC" true? I think you can theoretically listen on, e.g. just TCP but dial via QUIC - but that should be the exception. If we assumed the above, I could give you numbers on the distribution for the several DHT networks.
Nodes may be on a network that blocks udp. (Udp black hole)
sure! I thought, as a first-order approximation, the above assumption would be good enough (we'd have that data readily available). I actually have no idea or intuition about how prevalent networks that block UDP are.
If that's an unsuitable way forward, I think we'd indeed need to track incoming connections (or rather dial attempts). A simple idea would be to deploy libp2p hosts in, e.g., the IPFS (or whatever) network and then keep statistics about incoming dials. Not entirely sure how the go-libp2p behaviour of concurrently dialling up to eight addresses would impact the measurement. It could theoretically happen that if a TCP and a QUIC dial race each other that we (for whatever reason - packets dropped, UDP blocked, etc.) only see the TCP connection attempt. This would mean we would miscount "What percent of newer nodes are dialing via QUIC?".
From an implementation perspective, I know there's a mechanism to get notified about connection lifecycle events. Is there something similar for transports? In the past, I just wrapped the QUIC/TCP/WS transports to get notified if the Dial method was called. However, this is the wrong way around. I would get notified about outgoing dials but want to be notified about incoming dials.
@yiannisbot, something to discuss during our colo on Thursday 👍
sure! I thought, as a first-order approximation, the above assumption would be good enough (we'd have that data readily available). I actually have no idea or intuition about how prevalent networks that block UDP are.
That's partly why I want to get this information. I also want to get the information of how many nodes in the wild are choosing QUIC of their own volition rather than just accepting QUIC. My guess is these numbers are roughly the same, but we'd need to measure to be sure.
I know there's a mechanism to get notified about connection lifecycle events. Is there something similar for transports?
You can get the transport from the conn object there. You can also get the peer id, and then dedupe things later if needed (e.g. if you get both a tcp and quic conn, this should be counted as part of the peers who dial via quic set).
Happy to spend an hour to pair on this to kick it off :)
That's partly why I want to get this information. I also want to get the information of how many nodes in the wild are choosing QUIC of their own volition rather than just accepting QUIC. My guess is these numbers are roughly the same, but we'd need to measure to be sure.
I see 👍
Happy to spend an hour to pair on this to kick it off :)
Cool! Would love to hear more context. We didn't manage to chat about this during our colo yesterday. I'll bring it up during our weekly sync today :)
Could we use the honeypot, or the infrastructure we want to have set up from several vantage points to count incoming connections over QUIC? Monitoring is not going to be from many vantage points, but over a long period of time, it should be a statistically stable sample.
Meeting with Marco 2023-04-03:
- [ ] subscribe to connect messages. Labels
transport,agent_version- agent version would be stripped to kubo/x.y.z + all others
- [ ] expose prometheus metric in our DHT lookup measurement
- increment every time a connect event occurs
- first wait for identify to complete
alternative approach, using a database:
- create a
connectionstable. Columns:- peer_id
- transport
- timestamp
- agent_version
- ip_address
- security transport
- multiplexer yamux/mplex
Hi Marco, copying over our discussion from Slack here.
I took all the connections that I’ve tracked and found which peers connected via quic (and also tcp) and which peers only connected via tcp. I only considered those peers that connected at least 10 times with us, so that we have a reasonable chance that a quic connection has succeeded if that peer dials with quic. Then, I checked which agent versions these peers reported and tallied them up by agent version. This is the result:

For most of the connections I couldn't record an agent version (see the bar on the far right). Maybe I’m doing something wrong? This is the relevant line.
Another angle to look at the data. Here are all connection events and their corresponding transport:

@dennis-tra
For most of the connections I couldn't record an agent version (see the bar on the far right). Maybe I’m doing something wrong? This is the relevant line.
The code that you have pointed to uses go-libp2p v0.26.3 Maybe you're running into this race condition with using IdentifyWait within a net.notifee https://github.com/libp2p/go-libp2p/pull/2173
nice, thanks for the pointer @sukunrt! This IdentifyWait thing was bugging me not only for this - great to see the race condition go away 👍
I think we can close this issue now. Thanks @dennis-tra !