garmadon icon indicating copy to clipboard operation
garmadon copied to clipboard

Get network statistics per container

Open ashangit opened this issue 6 years ago • 8 comments

Some of our users would like to have some network statistics from containers, like:

  • nb bytes
  • nb packets
  • errors/drop...

To get those metrics we can rely on /prod//net/dev file and use same mechanism than the process tree one used on nodemanagers to get mmemory/vcore used by container.

ashangit avatar Nov 20 '18 11:11 ashangit

FYI, Oshi can provide those information already through NetworkIF: https://github.com/oshi/oshi/blob/master/oshi-core/src/main/java/oshi/hardware/NetworkIF.java#L207

so you can add them in: https://github.com/criteo/garmadon/blob/master/jvm-statistics/core/src/main/java/com/criteo/jvm/statistics/NetworkStatistics.java

jpbempel avatar Nov 20 '18 12:11 jpbempel

From my understanding this will provide metrics from the OS point of view not from containers, no?

ashangit avatar Nov 20 '18 16:11 ashangit

@ashangit /proc/<pid>/net/dev provides interface level stats, NOT per process! So it doesn't change anything.

jpbempel avatar Nov 21 '18 06:11 jpbempel

AFAICT packets & errors are at interface levels so you cannot get per process. you can get bytes recv/sent in different ways like for example at Hadoop Level but not packets or errors which don't have the information about which socket is associated with.

jpbempel avatar Nov 21 '18 07:11 jpbempel

Ok my bad So we need to find an other way. We can't also get it from hadoop as wehave more and more "non JVM" container (python, tensorflow...)

ashangit avatar Nov 21 '18 09:11 ashangit

@ashangit Don't seem to be obvious. nethogs (https://github.com/raboof/nethogs) uses libpcap to decode packet header and get length of the packet to know what is the real time bandwidth use by the process. But cannot get history and then you need to run permanently to get total qty of bytes/packets recv/sent. I don't think this is sustainable for our use case

Honestly I don't see solution to attribute network activity per process

jpbempel avatar Nov 21 '18 10:11 jpbempel

Well after some research: I may have find a solution in ss -tinp

State       Recv-Q Send-Q                   Local Address:Port                                  Peer Address:Port
ESTAB       0      0                            10.0.2.15:22                                        10.0.2.2:57375               users:(("sshd",pid=9990,fd=3))
         cubic rto:201 rtt:0.241/0.054 ato:40 mss:1460 cwnd:10 bytes_acked:94885 bytes_received:43212 segs_out:303 segs_in:459 send 484.6Mbps lastsnd:757714 lastrcv:756996 lastack:756996 pacing_rate 965.8Mbps rcv_rtt:311220 rcv_space:29532
ESTAB       0      0                            10.0.2.15:22                                        10.0.2.2:56338               users:(("sshd",pid=7363,fd=3))
         cubic rto:201 rtt:0.449/0.182 ato:47 mss:1460 cwnd:10 ssthresh:16 bytes_acked:899733 bytes_received:255916 segs_out:4888 segs_in:8044 send 260.1Mbps lastsnd:15 lastrcv:16 lastack:15 pacing_rate 519.8Mbps rcv_rtt:451431 rcv_space:78608
ESTAB       0      0                            10.0.2.15:58400                                89.30.125.167:25                  users:(("telnet",pid=15543,fd=3))
         cubic rto:205 rtt:4.11/2.055 ato:40 mss:1460 cwnd:10 bytes_acked:1 bytes_received:24 segs_out:3 segs_in:2 send 28.4Mbps lastsnd:531253 lastrcv:531235 lastack:531235 pacing_rate 56.4Mbps rcv_space:29200
ESTAB       0      0                            10.0.2.15:22                                        10.0.2.2:50323               users:(("sshd",pid=2421,fd=3))
         cubic rto:201 rtt:0.654/0.244 ato:40 mss:1460 cwnd:8 ssthresh:7 bytes_acked:3058309 bytes_received:1102444 segs_out:19837 segs_in:35236 send 142.9Mbps lastsnd:531234 lastrcv:531432 lastack:531233 pacing_rate 285.6Mbps retrans:0/4 rcv_rtt:240486 rcv_space:54912

jpbempel avatar Nov 22 '18 10:11 jpbempel

Looks to be a good startup, just have some concerns on the impact it could have on loaded servers Lets discuss about it IRL next week

ashangit avatar Nov 22 '18 19:11 ashangit