hotspot icon indicating copy to clipboard operation
hotspot copied to clipboard

Add softirq processing time into Off-CPU time

Open Radrik5 opened this issue 6 months ago • 2 comments
trafficstars

Is your feature request related to a problem? Please describe.

Linux kernel processes softirq inside system calls but hotspot does not show them.

I have a single-threaded network service that forwards data from one TCP connection to another using non-blocking sockets (there are no waits or sleeps in user space). When the service is overloaded (no idle time, full TCP receive buffers), CPU usage reported by top/htop is 65%, the number of On-CPU samples reported by hotspot is 65K per second (65%, using perf record -F 999) and Off-CPU time is 7ms per second (0.7% of time). The remaining 35% of time is spent in softirq handler and not visible in hotspot.

Describe the solution you'd like

If I add softirq events to perf record, I would like hotspot to include them into Off-CPU time, like it does for scheduler events. In this case the remaining 35% Off-CPU time would be visible in Hotspot.

perf record -e irq:softirq_entry -e irq:softirq_exit -e irq:softirq_raise ...

Describe alternatives you've considered

I haven't found any alternatives. Even Brendan Gregg's FlameGraph scripts do not support it.

Additional context

None.

Radrik5 avatar Apr 27 '25 14:04 Radrik5

Sounds like a great idea, but I'll need to have a MWE that reproduces the setup such that I can experiment with recording the data and then assembling the required analysis code on our end. Can you provide such a code? Barring that, e.g. if it doesn't reproduce on x86 or the like, can you provide a perf.data file with the DSOs referenced by it and a sysroot to allow me to analyze the file locally?

milianw avatar May 07 '25 13:05 milianw

I created a small project that replicates the issue: https://github.com/Radrik5/softirq

Playing with perf record for it I noticed that the amount of irq:softirq_* events is so large, so I had to increase the buffer to -m 1G to avoid loosing chunks and then limit the number of events with -F 1000 because the data in Hotspot looked corrupted.

Maybe perf record could collect such events with -c 100 or -c 1000 and then Hotspot could multiply softirq time by 100 or 1000 to get more or less accurate estimate of off-CPU time spent in softirq processing. perf report --header displays sampling parameters for each event:

  • -F 1000: # event : name = irq:softirq_entry, ..., { sample_period, sample_freq } = 1000, ..., freq = 1, ...
  • -c 1001: # event : name = irq:softirq_entry, ..., { sample_period, sample_freq } = 1001, ... without freq = 1

However, adding -c or -F to perf record affects all selected events (including sched:sched_switch), so it may affect accuracy of off-CPU time calculation in Hotspot. I haven't found a way to specify different sampling parameters for different events.

Radrik5 avatar May 08 '25 13:05 Radrik5

see https://www.man7.org/linux/man-pages/man1/perf-record.1.html and look for

a symbolically formed event like
               pmu/config=M,config1=N,config3=K/
...
                   Here are some common parameters:
                   - 'period': Set event sampling period
                   - 'freq': Set event sampling frequency

for how to define a custom sampling configuration for an individual event.

that said, why you describe sounds interesting but not of immediate use to me - as such, I won't find the time to work on this any time soon. but patches welcome!

milianw avatar Jul 18 '25 06:07 milianw