bcc
bcc copied to clipboard
tools/tcpdrop.py: I was running this tools but I got a 0 PID. Is it a bug?
Hi @linrl3
I see that secondary_startup
function in the stack. Maybe this is a swapper
process and the PID is 0.
Yes, I add comm
column to the output, and it shows that the process with PID 0 is named swapper
.
So it's not a bug. But I wonder why swapper
interact with network stack.
This is indeed a strange case. Typically, the networking softirq is processed by kernel ksoftirqd kthread which has a non-zero kernel pid. But in this case, I think it happens due to the following code,
static inline void invoke_softirq(void)
{
if (ksoftirqd_running(local_softirq_pending()))
return;
if (!force_irqthreads || !__this_cpu_read(ksoftirqd)) {
#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
/*
* We can safely execute softirq on the current stack if
* it is the irq stack, because it should be near empty
* at this stage.
*/
__do_softirq();
#else
/*
* Otherwise, irq_exit() is called on the task stack that can
* be potentially deep already. So call softirq in its own stack
* to prevent from any overrun.
*/
do_softirq_own_stack();
#endif
} else {
wakeup_softirqd();
}
}
Looks like the following kernel options can enable softirq handled immediately after a hardware interrupt without calling ksoftirqd.
threadirqs [KNL]
Force threading of all interrupt handlers except those
marked explicitly IRQF_NO_THREAD.
Another possibility is you do not have ksoftirqd running on a particular cpu, but a softirq is somehow triggered, so you have to run it... but this is less likely.
@yonghong-song I also found the 0 pid issue in another tools like tcprtt after I print the pid by bpf_get_current_pid_tgid.
Kernel handles softirq by two paths, one is irq_exit()->invoke_softirq, another one is ksoftirqd -> invoke_softirq PID = 0 is a normal situation because hardware IRQ interrupts the system during CPU is idle. Then bpf_get_current_pid_tgid will be 0.
@netedwardwu You are right. It is a normal path. My previous statement about the impact with kernel option threadirqs
is incorrect. That kernel options will set force_irqthreads
to be true. But the default value force_irqthreads
will be false so by default, __do_softirq
will be called after hardware interrupt if there is no existing softirqd running (being interrupted).
@yonghong-song I also found the 0 pid issue in another tools like tcprtt after I print the pid by bpf_get_current_pid_tgid.
@linrl3 this should be the same case as tcpdrop.py, tcp_rcv_established
is called by softirq (packet recv path).
@yonghong-song In this case, tcp_rcv_established is called by softirq. If it's not a Idle process, is the pid we captured the same as the pid who create the tcp connect?
@linrl3 , NO! The PID captured from softirq is worthless, you can test with one process only drain all CPU resources but not network jobs. Another process do network jobs, then you will find out all PID captured from softirq is the drain CPU process
@netedwardwu Yes, I run a cpu drained process and a bcc program which probe tcp_rcv_established
. I found most of the PID captured is the pid of the cpu drained process.
tcpretransmit exhibits the same behavior with both kprobe and tracepoint flavors. In most cases, the PID is 0, in some cases the PID is from a completely unrelated process.
Unrelated process example: I'm running telent to 10.10.10.10:23 (nothing there), most tcpretransmit will show PID 0, but some will show a PID for firefox (firefox is not attempting to connect to 10.10.10.10, nor anything on port 23).
@michaelgugino That is true. A lot of retransmit requests are handled by kernel tcp_write_timer
which is executed in softirq context. The underlying process could be idle thread (PID 0) or any other thread.
@michaelgugino That is true. A lot of retransmit requests are handled by kernel
tcp_write_timer
which is executed in softirq context. The underlying process could be idle thread (PID 0) or any other thread.
Also happend when used tcp_set_state
. The case was that I want to get the user-space Pid ,so I used the current->pid
.But I got the wrong Pid that is completely unrelated.
So, is there a way to get the correct user-space Pid and Comm?