conky Add top_net similar to top

Hello. I wish there were a command to show "top N processes doing network I/O" (which should not include disk I/O shown by top_io). Perhaps ideas on how to filter network I/O per process could be taken from https://github.com/raboof/nethogs.

Apr 04 '16 07:04 balta2ar

I've always missed this last puzzle in my conky configuration. Conky shows total network usage, but occasionally you need to see what process exactly is consuming bandwidth. Running nethogs may be not that convenient sometimes — that's why we have conky after all.

Today I've managed to build something that I've been dreaming for a long time. See the picture below, Networking section. You can see that aria2c is actively downloading, and Chromium is actively uploading (that's speedtest.net). Measurements are in KBytes. The output more or less corresponds to conky graphs just above.

systemtap-nettop

This was possible thanks to SystemTap. I'm using ArchLinux, my kernel is 4.7.1, and unfortunately, things didn't work out of the box. First, I had to compile the kernel with debug info, them I had to use the most recent SystemTap version from git because 3.0 release is already incompatible with my current kernel. In ArchLinux, though, it was rather easy to do. Looking back I can't say with confidence that having debug info is necessary. I'll probably find it out when new kernel arrives.

Next I took this SystemTap example as a starting point: https://sourceware.org/systemtap/examples/network/nettop.stp. It didn't show meaningful results, so instead of using netdev probes, I messed with tcp.sendmsg and tcp.recvmsg (you need to fix my script if you need UDP. I also tried ip probe but it displayed weird results as well). Here (https://gist.github.com/balta2ar/3d8070deccdbac569b4d3fab1de00f9b) is the script:

#! /usr/bin/env stap

global ifxmit, ifrecv
global ifmerged

probe tcp.sendmsg.return
{
  if (size > 0) {
    ifxmit[pid(), "eth0", execname(), uid()] <<< size
  }
}

probe tcp.recvmsg.return
{
  if (size > 0) {
    ifrecv[pid(), "eth0", execname(), uid()] <<< size
  }
}

function print_activity()
{
  printf("${color1}%-17s %5s %6s %6s\n",
         "Name", "PID", "Up", "Down")

  foreach ([pid, dev, exec, uid] in ifrecv) {
    ifmerged[pid, dev, exec, uid] += @count(ifrecv[pid,dev,exec,uid]);
  }
  foreach ([pid, dev, exec, uid] in ifxmit) {
    ifmerged[pid, dev, exec, uid] += @count(ifxmit[pid,dev,exec,uid]);
  }
  counter = 0
  foreach ([pid, dev, exec, uid] in ifmerged-) {
    n_xmit = @count(ifxmit[pid, dev, exec, uid])
    n_recv = @count(ifrecv[pid, dev, exec, uid])
    sent = n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0
    recv = n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0
    printf("${color2} %-16s %5d %6d %6d\n",
           exec, pid, sent, recv)
    counter += 1
    if (counter >= 4) {
      break
    }
  }
  while (counter < 4) {
    print("\n")
    counter += 1
  }

  delete ifxmit
  delete ifrecv
  delete ifmerged
}

probe timer.ms(1000), end, error
{
  print_activity()
}

The script prints top 4 processes that generate TCP traffic in format suitable for my conky configuration (I'm using colors). The order is descending according to the sum of send + recv traffic. As you can see, there may be less than 4 processes, in this case the script prints newlines to make it 4 lines long. That's just to fit nicely into my conky.

I run it as follows:

sudo stap -o /tmp/nettop.log -S 1,1 nettop.stp

-S options specifies the maximum size of the output file and the number of them. If size exceeds N MBytes, the output is rotated. In fact I don't need the output to grow that large, but I didn't find a way to do it better, -S argument for stap treats the value as MB. The output is appended into /tmp/nettop.log.0 every second.

The last part in conky. I have added this line to display top 4 processes:

${execpi 1 tail -5 /tmp/nettop.log.0}

This altogether feels hacky, but it does its job. Hope it helps somebody.

Aug 20 '16 19:08 balta2ar

Hello, I reopen this topic to recommend a demon developed by a good friend of mine, which allows parsing the output of Nethogs in a flat file to add it to Conky very simply. It also allows to exclude the processes of certain users so as not to overload the output of the file. Here I leave your repo here in Github, and also leave a sample of what the script does.

https://github.com/schcriher/nethogs-daemon

photo_2017-08-29_21-00-28

Best regards!

Sep 06 '17 14:09 CocoCocoAlx

Thanks for the heads up! By the way, I noticed that bcc tools have significantly developed recently and I took advantage of that. tcptop tool is what I was looking for. I adapted it slightly to my needs and here is how it looks right now: conky-nettop

The idea is to run a bcc program in background and let it write into a file. conky consequently, reads from that file and displays current stats.

Adapted code:

Adapted tcptop code

#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# tcptop    Summarize TCP send/recv throughput by host.
#           For Linux, uses BCC, eBPF. Embedded C.
#
# USAGE: tcptop [-h] [-C] [-S] [-p PID] [interval [count]]
#
# This uses dynamic tracing of kernel functions, and will need to be updated
# to match kernel changes.
#
# WARNING: This traces all send/receives at the TCP level, and while it
# summarizes data in-kernel to reduce overhead, there may still be some
# overhead at high TCP send/receive rates (eg, ~13% of one CPU at 100k TCP
# events/sec. This is not the same as packet rate: funccount can be used to
# count the kprobes below to find out the TCP rate). Test in a lab environment
# first. If your send/receive rate is low (eg, <1k/sec) then the overhead is
# expected to be negligible.
#
# ToDo: Fit output to screen size (top X only) in default (not -C) mode.
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 02-Sep-2016   Brendan Gregg   Created this.

from __future__ import print_function
from bcc import BPF
import argparse
from socket import inet_ntop, AF_INET, AF_INET6
from struct import pack
from time import sleep, strftime
from subprocess import call
import ctypes as ct
import os
import sys

# arguments
def range_check(string):
    value = int(string)
    if value < 1:
        msg = "value must be stricly positive, got %d" % (value,)
        raise argparse.ArgumentTypeError(msg)
    return value

examples = """examples:
    ./tcptop           # trace TCP send/recv by host
    ./tcptop -C        # don't clear the screen
    ./tcptop -p 181    # only trace PID 181
"""
parser = argparse.ArgumentParser(
    description="Summarize TCP send/recv throughput by host",
    formatter_class=argparse.RawDescriptionHelpFormatter,
    epilog=examples)
parser.add_argument("-C", "--noclear", action="store_true",
    help="don't clear the screen")
parser.add_argument("-S", "--nosummary", action="store_true",
    help="skip system summary line")
parser.add_argument("-p", "--pid",
    help="trace this PID only")
parser.add_argument("-o", "--output", default=None,
    help="output file")
parser.add_argument("interval", nargs="?", default=1, type=range_check,
    help="output interval, in seconds (default 1)")
parser.add_argument("count", nargs="?", default=-1, type=range_check,
    help="number of outputs")
args = parser.parse_args()
debug = 0

# linux stats
loadavg = "/proc/loadavg"

# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <net/sock.h>
#include <bcc/proto.h>

struct ipv4_key_t {
    u32 pid;
};
BPF_HASH(ipv4_send_bytes, struct ipv4_key_t);
BPF_HASH(ipv4_recv_bytes, struct ipv4_key_t);

struct ipv6_key_t {
    u32 pid;
    // workaround until unsigned __int128 support:
};
BPF_HASH(ipv6_send_bytes, struct ipv6_key_t);
BPF_HASH(ipv6_recv_bytes, struct ipv6_key_t);

int kprobe__tcp_sendmsg(struct pt_regs *ctx, struct sock *sk,
    struct msghdr *msg, size_t size)
{
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    FILTER
    u16 dport = 0, family = sk->__sk_common.skc_family;
    u64 *val, zero = 0;

    if (family == AF_INET) {
        struct ipv4_key_t ipv4_key = {.pid = pid};
        val = ipv4_send_bytes.lookup_or_init(&ipv4_key, &zero);
        (*val) += size;

    } else if (family == AF_INET6) {
        struct ipv6_key_t ipv6_key = {.pid = pid};
        val = ipv6_send_bytes.lookup_or_init(&ipv6_key, &zero);
        (*val) += size;
    }
    // else drop

    return 0;
}

/*
 * tcp_recvmsg() would be obvious to trace, but is less suitable because:
 * - we'd need to trace both entry and return, to have both sock and size
 * - misses tcp_read_sock() traffic
 * we'd much prefer tracepoints once they are available.
 */
int kprobe__tcp_cleanup_rbuf(struct pt_regs *ctx, struct sock *sk, int copied)
{
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    FILTER
    u16 dport = 0, family = sk->__sk_common.skc_family;
    u64 *val, zero = 0;

    if (copied <= 0)
	    return 0;

    if (family == AF_INET) {
        struct ipv4_key_t ipv4_key = {.pid = pid};
        val = ipv4_recv_bytes.lookup_or_init(&ipv4_key, &zero);
        (*val) += copied;

    } else if (family == AF_INET6) {
        struct ipv6_key_t ipv6_key = {.pid = pid};
        val = ipv6_recv_bytes.lookup_or_init(&ipv6_key, &zero);
        (*val) += copied;
    }
    // else drop

    return 0;
}
"""

# code substitutions
if args.pid:
    bpf_text = bpf_text.replace('FILTER',
        'if (pid != %s) { return 0; }' % args.pid)
else:
    bpf_text = bpf_text.replace('FILTER', '')
if debug:
    print(bpf_text)

def pid_to_comm(pid):
    try:
        comm = open("/proc/%d/cmdline" % pid, "r").read().rstrip().replace('\x00', ' ')
        parts = comm.split()
        if len(parts) > 0:
            parts[0] = os.path.basename(parts[0])
        comm = ' '.join(parts)
        #comm = open("/proc/%d/comm" % pid, "r").read().rstrip()
        return comm
    except IOError:
        return str(pid)

# initialize BPF
b = BPF(text=bpf_text)

ipv4_send_bytes = b["ipv4_send_bytes"]
ipv4_recv_bytes = b["ipv4_recv_bytes"]
ipv6_send_bytes = b["ipv6_send_bytes"]
ipv6_recv_bytes = b["ipv6_recv_bytes"]

print('Tracing... Output every %s secs. Hit Ctrl-C to end' % args.interval)

# output
i = 0
exiting = False

while i != args.count and not exiting:
    try:
        sleep(args.interval)
    except KeyboardInterrupt:
        exiting = True

    if not args.nosummary:
        with open(loadavg) as stats:
            print("%-8s loadavg: %s" % (strftime("%H:%M:%S"), stats.read()))

    output = sys.stdout if args.output is None else open(args.output, 'w')

    # IPv4:  build dict of all seen keys
    keys = ipv4_recv_bytes
    for k, v in ipv4_send_bytes.items():
        if k not in keys:
            keys[k] = v

    # output
    for k, v in reversed(sorted(keys.items(), key=lambda keys: keys[1].value)):
        send_kbytes = 0
        if k in ipv4_send_bytes:
            send_kbytes = int(ipv4_send_bytes[k].value / 1024)
        recv_kbytes = 0
        if k in ipv4_recv_bytes:
            recv_kbytes = int(ipv4_recv_bytes[k].value / 1024)

        print(" %-15.15s %6d %6d %6d" % (
            pid_to_comm(k.pid),
            k.pid,
            recv_kbytes, send_kbytes), file=output)

    ipv4_send_bytes.clear()
    ipv4_recv_bytes.clear()

    i += 1

    if output != sys.stdout:
        output.close()

Sep 06 '17 14:09 balta2ar