bcc Incorrect calculation of recv throughput in tcptop.py

tcp_cleanup_rbuf is called several times in a loop inside tcp_recvmsg. At the same time, it is passed the accumulated parameter "copied". Thus, the same data is taken into account several times.

int tcp_recvmsg(...) {
     ...
     int copied = 0;
     ...
     do {
         ...
         tcp_cleanup_rbuf(sk, copied);
         ...
         copied += used;
         ...
     } while (len > 0);
     
     tcp_cleanup_rbuf(sk, copied);
     return copied;
     ...
}

Jul 04 '25 14:07 sc07kvm

Could you elaborate ? I don't see tcpdrop do anything to do with tcp_cleanup_rbuf.

Jul 08 '25 12:07 chenhengqi

Of course, on newer kernels(>=5.11) this happens via the tcp_recvmsg_locked function, which is inlined:

static int tcp_recvmsg_locked() {
    ...
    do {
        ...
        tcp_cleanup_rbuf(...);
        ...
    } while (...);
     
    tcp_cleanup_rbuf(...);
    ...
}

int tcp_recvmsg(...) {
    ...
    ret = tcp_recvmsg_locked(...);
    ...
}

If you look at older versions, you can see the call directly in tcp_recvmsg - https://github.com/torvalds/linux/blob/2c85ebc57b3e1817b6ce1a6b703928e113a90442/net/ipv4/tcp.c#L2159

Jul 08 '25 13:07 sc07kvm

How does this affect the behavior of tcpdrop.py ?

Jul 09 '25 01:07 chenhengqi

tcpdrop.py hooks tcp_cleanup_rbuf to count the amount of data received based on the "copied" argument:

int kprobe__tcp_cleanup_rbuf(..., int copied) {
    ...
    ipv4_recv_bytes.increment(ipv4_key, copied);
    ....
}

For example, if tcp_cleanup_rbuf function is called twice within a single tcp_recvmsg call with "copied" argumen of 10 and 20, the ipv4_recv_bytes counter will end up being incremented by 30 instead of 20.:

--> tcp_recvmsg()
    --> tcp_cleanup_rbuf(..., copied = 10)
    --> tcp_cleanup_rbuf(..., copied = 20)

Jul 10 '25 04:07 sc07kvm

OK, could you help with a PR ?

Jul 10 '25 07:07 chenhengqi

I have no idea how to fix this at all

Jul 10 '25 15:07 sc07kvm

The following could be an possible way to do it:

Add two bpf progs to trace tcp_recvmsg() entry/exit.

At tcp_recvmsg() entry: put 'sk' into a hashmap. The value is copied = 0 When tracing tcp_cleanup_rbuf, if sk is in hashmap, updated copied value At tcp_recvmsg exit, retrieve the sk to copied mapping get 'copied' value and remove 'sk' from hashmap.

During tcp_cleanup_rbuf tracing if sk is NOT in hashmap, directly add copied to overall accounting.

Jul 13 '25 16:07 yonghong-song

Add two bpf progs to trace tcp_recvmsg() entry/exit.

you can't do that. kretprobe has a limit on the number of functions that can be executed simultaneously (maxactive parameter). On waiting functions like tcp_recvmsg, this will cause losses.

Jul 14 '25 09:07 sc07kvm

Add two bpf progs to trace tcp_recvmsg() entry/exit.

you can't do that. kretprobe has a limit on the number of functions that can be executed simultaneously (maxactive parameter). On waiting functions like tcp_recvmsg, this will cause losses.

How about we use kfunc and kretfunc?

Jul 14 '25 18:07 yonghong-song

Using fentry/fexit will greatly reduce the scope of supported OS. Kernel >= v5.5 + CONFIG_DEBUG_INFO_BTF flag

Jul 15 '25 13:07 sc07kvm

Using fentry/fexit will greatly reduce the scope of supported OS. Kernel >= v5.5 + CONFIG_DEBUG_INFO_BTF flag

That is really unfortunate. But since it is not accurate any way, so I think we could move current tcptop.py to old directory and add fentry/fexit support in tcptop.py and add comments in the beginning of the tool to mention it only works for kernel >= 5.5.

Jul 17 '25 06:07 yonghong-song