stress icon indicating copy to clipboard operation
stress copied to clipboard

CPU consumer (cpu_hog) is really really bad at consuming CPU

Open marcusmueller opened this issue 8 months ago • 0 comments

The problem

Since the purpose of stress -c is to put real load on the specified number of CPU cores, it's desirable that of these CPU cores, pipelines don't stall a majority of time, yet:

perf stat stress -c 16 -t 5

tells us that the CPU is mostly idle (if occupied):

stress: info: [526148] dispatching hogs: 16 cpu, 0 io, 0 vm, 0 hdd
stress: info: [526148] successful run completed in 5s

 Performance counter stats for 'stress -c 16 -t 5':

         79,580.45 msec task-clock:u                     #   15.910 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               309      page-faults:u                    #    3.883 /sec
   418,716,815,425      cycles:u                         #    5.262 GHz
   262,176,845,042      stalled-cycles-frontend:u        #   62.61% frontend cycles idle
   617,055,840,870      instructions:u                   #    1.47  insn per cycle
                                                  #    0.42  stalled cycles per insn
   175,186,890,751      branches:u                       #    2.201 G/sec
       269,450,686      branch-misses:u                  #    0.15% of all branches

       5.001799550 seconds time elapsed

      79.463002000 seconds user
       0.007854000 seconds sys

I'd like to draw antention to 62.61% frontend cycles idle, meaning that the CPU cores' frontends couldn't proceed processing instructions. That means, for example, in hyperthreading cores, that we're not really using the "half-core" fully, thus not slowing down the other half of the hyperthreaded core as intended (which is what I needed to use stress for).

Why does that happen?

simple. The code is

while(1){sqrt(rand());}

which, unlike the man page claims, isn't actually "spinning on sqrt()". In fact, a compiler with floating point exceptions disabled might notice the result of sqrt is never used and simply not even execute it. (that's not what's happening in a default build, however.)

Instead, the bottleneck is rand(), which isn't even reentrant, and should never have been used from multiple threads. In a perf record stress -c 16 -t 5, you'll notice on a modern x86_64, that the CPU is stuck ca. 99.8% of time in __random(); to little surprise, because, and that's the problem here, rand() modifies global state, and hence heavily depends on memory views being kept consistent between CPU cores.

So, a serious bug to use rand() here, and a slight bug to use sqrt() without doing anything with the result.

marcusmueller avatar Feb 20 '25 22:02 marcusmueller