ksoftirq high cpu usage under network load
Hi all,
Unsure if this is normal, but whenever the network is under load, cpu usage of ksoftirqd skyrockets. This seems to end up slowing everything down. I've noticed some applications really struggling to serve more than 10-30mbit speeds.
This is using the RPI4 8gbram model.
To recreate simply using curl (maxes out at around 60mbit), connected to a 1gigbit switch, and with 100mbit WAN connection.
curl http://ipv4.download.thinkbroadband.com/512MB.zip > /dev/null
Results in this in top.
top - 14:31:10 up 1 day, 56 min, 2 users, load average: 1.01, 1.73, 2.16
Threads: 697 total, 2 running, 695 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.2 us, 10.4 sy, 0.0 ni, 61.5 id, 0.0 wa, 0.0 hi, 23.9 si, 0.0 st
MiB Mem : 7813.0 total, 4144.3 free, 1103.2 used, 2565.5 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 6555.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12 root 20 0 0 0 0 R 93.2 0.0 43:47.43 ksoftirqd/0
30615 pi 20 0 94628 8348 7172 S 32.4 0.1 0:06.17 curl
pi doesn't seem too toasty either.
cat /sys/class/thermal/thermal_zone0/temp
54043
is what it hovers around during the download
Googling "ksoftirqd", the top hit is https://askubuntu.com/questions/7858/why-is-ksoftirqd-0-process-using-all-of-my-cpu, to which the answer says:
Your computer communicates with the devices attached to it through IRQs (interrupt requests). When an interrupt comes from a device, the operating system pauses what it was doing and starts addressing that interrupt.
In some situations IRQs come very very fast one after the other and the operating system cannot finish servicing one before another one arrives. This can happen when a high speed network card receives a very large number of packets in a short time frame.
Because the operating system cannot handle IRQs as they arrive (because they arrive too fast one after the other), the operating system queues them for later processing by a special internal process named ksoftirqd.
If ksoftirqd is taking more than a tiny percentage of CPU time, this indicates the machine is under heavy interrupt load.
Checking /proc/interrupts before and after running that curl I see:
< 73: 1208903 0 0 0 GICv2 189 Level eth0
< 74: 118762 0 0 0 GICv2 190 Level eth0
---
> 73: 1516220 0 0 0 GICv2 189 Level eth0
> 74: 226089 0 0 0 GICv2 190 Level eth0
The total interrupts for eth0 in that period is about 415000 in 7 seconds (this is on an office LAN with a fair amount of broadcast and multicast traffic), which does sound quite high, but (51210001000)/1500 (the mtu) is 341333, and there is traffic in the other direction as well, so it doesn't sound excessive.
In other words, I've seen nothing to suggest that Linux isn't working as expected.