scope icon indicating copy to clipboard operation
scope copied to clipboard

Include network connection metrics in topology view

Open jlpadilla opened this issue 7 years ago • 6 comments

Currently, the topology view shows the connections between nodes, but it doesn’t show the metrics for those connections. I think it would be useful to show connection metrics in the topology view, so users can visualize a “heat map” of the nodes getting the most network activity.

A simple solution is to add the connection data to the node’s metrics so the node renders with a fill, like CPU and Memory. The connection data is easily accessible because it’s being used to compute adjacency.

We are interested in contributing this change, and would like feedback on the proposed solution before finalizing a pull request.

jlpadilla avatar Mar 27 '18 20:03 jlpadilla

What metric exactly were you thinking of? Number of connections? Number of connections per second? Bytes in/out per second? Something else?

rade avatar Mar 28 '18 08:03 rade

I was thinking about number of connections because that's already being collected and showed in the node details panel. The other metrics can be useful too, but my understanding is that those should be in a plugin, like scope-http-statistics.

It looks simple to expose the number of connections for each node in the topology, and the benefit to users is that we'll be able to access and visualize the data without having to open the details panel for each node. The node fill decorator will represent the number of connections to/from each node as a percentage of all connections.

jlpadilla avatar Mar 28 '18 14:03 jlpadilla

Why is "number of connections" an interesting metric for you? I can see why it may be useful occasionally, but it's not something we at Weaveworks have ever felt the need to pay attention to in the systems we are developing and operating.

Also note that the count shown in the details panel, like the actual connections themselves, represent an aggregate over the reporting interval, which by default is 15s. So, for example, if a component gets one connection per second, each lasting 100ms, these will all be shown and counted, resulting in a figure of 15.

The more useful figures to report would be number of connections per second, and number of concurrent connections. But these aren't readily available. And even if we had them, I suspect request rates and number of concurrent request would be a far more interesting, given the propensity of http request pipelining. See #1631.

rade avatar Apr 02 '18 09:04 rade

The motivation is to show network metrics at the topology level to help visualize network traffic and identify "hot" resources or bottlenecks. I thought that number of connections would be an easy first step because the data is already available, however I agree that other metrics like connections per second and concurrent connections are more interesting.

I have working code showing the number of connections and can submit a pull request, but I'll take your advice on weather this metric provides additional value to the users or not.

jlpadilla avatar Apr 03 '18 20:04 jlpadilla

As I said, I don't consider that particular "number of connections" figure particularly useful. The fact that it does not represent either of the obvious interpretations of that term - "number of concurrent connections" or "number of connections per second" - is troublesome too.

rade avatar Apr 10 '18 04:04 rade

While working on #3709 I felt it would be useful to give separate numbers for established connections and TIME_WAIT connections, because the latter indicates an inefficiency that may be fixable with different software parameters (e.g. raising the size of a connection pool).

bboreham avatar Oct 13 '19 13:10 bboreham