ksoftirq high cpu usage under network load

Open btrepp opened this issue 4 years ago • 1 comments

Hi all,

Unsure if this is normal, but whenever the network is under load, cpu usage of ksoftirqd skyrockets. This seems to end up slowing everything down. I've noticed some applications really struggling to serve more than 10-30mbit speeds.

This is using the RPI4 8gbram model.

To recreate simply using curl (maxes out at around 60mbit), connected to a 1gigbit switch, and with 100mbit WAN connection.

curl http://ipv4.download.thinkbroadband.com/512MB.zip > /dev/null

Results in this in top.

top - 14:31:10 up 1 day, 56 min,  2 users,  load average: 1.01, 1.73, 2.16
Threads: 697 total,   2 running, 695 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.2 us, 10.4 sy,  0.0 ni, 61.5 id,  0.0 wa,  0.0 hi, 23.9 si,  0.0 st
MiB Mem :   7813.0 total,   4144.3 free,   1103.2 used,   2565.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   6555.2 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   12 root      20   0       0      0      0 R  93.2   0.0  43:47.43 ksoftirqd/0
30615 pi        20   0   94628   8348   7172 S  32.4   0.1   0:06.17 curl

pi doesn't seem too toasty either.

 cat /sys/class/thermal/thermal_zone0/temp
54043

is what it hovers around during the download

Sep 26 '21 06:09 btrepp

Googling "ksoftirqd", the top hit is https://askubuntu.com/questions/7858/why-is-ksoftirqd-0-process-using-all-of-my-cpu, to which the answer says:

Your computer communicates with the devices attached to it through IRQs (interrupt requests). When an interrupt comes from a device, the operating system pauses what it was doing and starts addressing that interrupt.

In some situations IRQs come very very fast one after the other and the operating system cannot finish servicing one before another one arrives. This can happen when a high speed network card receives a very large number of packets in a short time frame.

Because the operating system cannot handle IRQs as they arrive (because they arrive too fast one after the other), the operating system queues them for later processing by a special internal process named ksoftirqd.

If ksoftirqd is taking more than a tiny percentage of CPU time, this indicates the machine is under heavy interrupt load.

Checking /proc/interrupts before and after running that curl I see:

<  73:    1208903          0          0          0     GICv2 189 Level     eth0
<  74:     118762          0          0          0     GICv2 190 Level     eth0
---
>  73:    1516220          0          0          0     GICv2 189 Level     eth0
>  74:     226089          0          0          0     GICv2 190 Level     eth0

The total interrupts for eth0 in that period is about 415000 in 7 seconds (this is on an office LAN with a fair amount of broadcast and multicast traffic), which does sound quite high, but (51210001000)/1500 (the mtu) is 341333, and there is traffic in the other direction as well, so it doesn't sound excessive.

In other words, I've seen nothing to suggest that Linux isn't working as expected.

Sep 27 '21 07:09 pelwell