Incorrect calculation of recv throughput in tcptop.py
tcp_cleanup_rbuf is called several times in a loop inside tcp_recvmsg. At the same time, it is passed the accumulated parameter "copied". Thus, the same data is taken into account several times.
int tcp_recvmsg(...) {
...
int copied = 0;
...
do {
...
tcp_cleanup_rbuf(sk, copied);
...
copied += used;
...
} while (len > 0);
tcp_cleanup_rbuf(sk, copied);
return copied;
...
}
Could you elaborate ? I don't see tcpdrop do anything to do with tcp_cleanup_rbuf.
Of course, on newer kernels(>=5.11) this happens via the tcp_recvmsg_locked function, which is inlined:
static int tcp_recvmsg_locked() {
...
do {
...
tcp_cleanup_rbuf(...);
...
} while (...);
tcp_cleanup_rbuf(...);
...
}
int tcp_recvmsg(...) {
...
ret = tcp_recvmsg_locked(...);
...
}
If you look at older versions, you can see the call directly in tcp_recvmsg - https://github.com/torvalds/linux/blob/2c85ebc57b3e1817b6ce1a6b703928e113a90442/net/ipv4/tcp.c#L2159
How does this affect the behavior of tcpdrop.py ?
tcpdrop.py hooks tcp_cleanup_rbuf to count the amount of data received based on the "copied" argument:
int kprobe__tcp_cleanup_rbuf(..., int copied) {
...
ipv4_recv_bytes.increment(ipv4_key, copied);
....
}
For example, if tcp_cleanup_rbuf function is called twice within a single tcp_recvmsg call with "copied" argumen of 10 and 20, the ipv4_recv_bytes counter will end up being incremented by 30 instead of 20.:
--> tcp_recvmsg()
--> tcp_cleanup_rbuf(..., copied = 10)
--> tcp_cleanup_rbuf(..., copied = 20)
OK, could you help with a PR ?
I have no idea how to fix this at all
The following could be an possible way to do it:
- Add two bpf progs to trace tcp_recvmsg() entry/exit.
At tcp_recvmsg() entry: put 'sk' into a hashmap. The value is copied = 0 When tracing tcp_cleanup_rbuf, if sk is in hashmap, updated copied value At tcp_recvmsg exit, retrieve the sk to copied mapping get 'copied' value and remove 'sk' from hashmap.
During tcp_cleanup_rbuf tracing if sk is NOT in hashmap, directly add copied to overall accounting.
- Add two bpf progs to trace tcp_recvmsg() entry/exit.
you can't do that. kretprobe has a limit on the number of functions that can be executed simultaneously (maxactive parameter). On waiting functions like tcp_recvmsg, this will cause losses.
- Add two bpf progs to trace tcp_recvmsg() entry/exit.
you can't do that. kretprobe has a limit on the number of functions that can be executed simultaneously (maxactive parameter). On waiting functions like tcp_recvmsg, this will cause losses.
How about we use kfunc and kretfunc?
Using fentry/fexit will greatly reduce the scope of supported OS. Kernel >= v5.5 + CONFIG_DEBUG_INFO_BTF flag
Using fentry/fexit will greatly reduce the scope of supported OS. Kernel >= v5.5 + CONFIG_DEBUG_INFO_BTF flag
That is really unfortunate. But since it is not accurate any way, so I think we could move current tcptop.py to old directory and add fentry/fexit support in tcptop.py and add comments in the beginning of the tool to mention it only works for kernel >= 5.5.