firecracker icon indicating copy to clipboard operation
firecracker copied to clipboard

[Bug] CPU usage for firecracker grows out of control

Open jeromegn opened this issue 2 years ago • 4 comments

Describe the bug

After some time, some firecracker processes on our hosts grow to use as much CPU as possible (100%). Inside the guests, CPU usage is low (2-3%).

We have collected some diagnostics...

perf kvm on a high-CPU firecracker

Analyze events for pid(s) 6641, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time 

                 hlt      40833    99.69%    98.26%      0.32us  19528.58us     29.19us ( +-   5.93% )
           interrupt         76     0.19%     1.74%      0.43us   9798.54us    277.33us ( +-  57.11% )
                 msr         49     0.12%     0.01%      0.71us      2.82us      1.55us ( +-   5.31% )
               vintr          2     0.00%     0.00%      1.36us      2.16us      1.76us ( +-  22.73% )

Total Samples:40960, Total events handled time:1213240.92us.

perf kvm on a normally operating firecracker

Analyze events for pid(s) 39114, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time 

                 msr        351    59.69%     7.81%      0.73us 118078.80us    618.57us ( +-  61.33% )
                 hlt        106    18.03%    81.49%     69.82us 155820.39us  21370.64us ( +-  12.99% )
           interrupt         75    12.76%     1.17%      0.44us  29505.95us    433.52us ( +-  90.73% )
                 npf         39     6.63%     9.53%      3.03us 139254.99us   6792.68us ( +-  58.72% )
               cpuid         12     2.04%     0.00%      0.49us      1.97us      0.79us ( +-  15.77% )
               vintr          5     0.85%     0.00%      0.91us      2.02us      1.30us ( +-  16.12% )

Total Samples:588, Total events handled time:2779849.81us.

Running strace on a high-cpu firecracker outputs most ioctl KVM_RUN.

strace on high-cpu firecracker

$ timeout 120 strace -fp 18146 -c
strace: Process 18146 attached with 3 threads
strace: Process 18146 detached
strace: Process 18207 detached
strace: Process 18255 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 53.61   43.381786       12155      3569           ioctl
 41.21   33.342295       14347      2324           epoll_pwait
  2.79    2.254346         539      4183       906 read
  2.03    1.643915         361      4558       599 write
  0.31    0.251635         450       559           lseek
  0.03    0.027855         387        72           recvfrom
  0.01    0.009457         131        72           epoll_ctl
  0.00    0.002255         188        12           accept4
  0.00    0.001461         122        12           close
------ ----------- ----------- --------- --------- ----------------
100.00   80.915005                 15361      1505 total

strace on normally-operating firecracker

$ timeout 120 strace -fp 522 -c
strace: Process 522 attached with 3 threads
strace: Process 522 detached
strace: Process 550 detached
strace: Process 612 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.57    0.406513        1099       370           ioctl
 41.69    0.322366        1452       222           epoll_pwait
  3.28    0.025333          86       294        65 read
  2.12    0.016372          22       731       427 write
  0.14    0.001062          15        72           epoll_ctl
  0.12    0.000950          13        72           recvfrom
  0.06    0.000442          37        12           accept4
  0.04    0.000298          25        12           close
------ ----------- ----------- --------- --------- ----------------
100.00    0.773336                  1785       492 total

(these 2 firecrackers were on the same host, running at the same time)

CPU usage inside the guest for a high-cpu firecracker

image

(green is normal, red is kernel and light blue is steal CPU)

To Reproduce

We do not know how to reproduce this. It happens seemingly randomly.

Expected behaviour

CPU usage of firecracker should be proportional to CPU usage inside the guest.

Was this a known bug with the versions we're using? Would upgrading to a more recent firecracker help?

Environment

  • Firecracker version: mix of v0.23.5 and v0.24.4
  • Host kernel: mix of 5.11.7 and 5.11.16
  • Guest kernel: 5.12.2
  • Rootfs: varies wildly, often ubuntu, debian or alpine
  • Arch: x86_64

Additional context

This increases CPU usage and load on a host to very bad levels. VMs running on the host all experience significant steal even if we are not overprovisioning CPU.

Checks

  • [x] Have you searched the Firecracker Issues database for similar problems?
  • [x] Have you read the existing relevant Firecracker documentation?
  • [ ] Are you certain the bug being reported is a Firecracker issue?

I am not certain this is a firecracker issue, but I don't know what else it could be at this point.

jeromegn avatar Jul 01 '22 14:07 jeromegn

Update: We noticed this only happens on kernel v5.11.

Diagnosing more with kvm_stat, I noticed the "bad" firecracker process had absolutely no halt_wakeup events while correctly functioning firecracker processes had roughly the same amount of halt_exit and halt_wakeup events.

jeromegn avatar Jul 02 '22 22:07 jeromegn

We're not seeing any halt-polling events:

$ cat /sys/kernel/debug/tracing/set_event
kvm:kvm_vcpu_wakeup
kvm:kvm_halt_poll_ns

$ cat /sys/kernel/debug/tracing/set_event_pid
14577 # problematic firecracker

$ cat /sys/kernel/debug/tracing/trace_pipe
# nothing...

jeromegn avatar Jul 03 '22 14:07 jeromegn

I believe this issue is the same as what we're currently facing: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/2277#issuecomment-817992687

We're seeing roughly 1 million exits per second on affected VMs.

The fix appears to be to disable KVM_FEATURE_ASYNC_PF_INT on the guest CPUID: https://github.com/cloud-hypervisor/cloud-hypervisor/pull/2498

jeromegn avatar Jul 04 '22 01:07 jeromegn

I have implemented the same fix in a fork here: https://github.com/jeromegn/firecracker/commit/d6e73e4405d8f119605291750099f94d6b9715c8

Please let me know if I'm making a grave mistake 😅

jeromegn avatar Jul 05 '22 13:07 jeromegn

/close it resolved.

xiaods avatar Dec 13 '22 15:12 xiaods

Did the commits make it into this repo? I don't see them.

jeromegn avatar Dec 13 '22 19:12 jeromegn

Did the commits make it into this repo? I don't see them.

Oops, you are right, it have not yet pr for that.

xiaods avatar Dec 14 '22 08:12 xiaods

Disabling KVM_FEATURE_ASYNC_PF_INT was a temporary fix as per below conversation: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/2277#issuecomment-820200452 https://github.com/cloud-hypervisor/cloud-hypervisor/issues/2277#issuecomment-824805348

Disabling KVM_FEATURE_ASYNC_PF_INT avoids the issue but the actual issue is in KVM and the fix is in the below patch on the host side: https://patchwork.kernel.org/project/kvm/patch/[email protected]/

I see the fix available in https://github.com/torvalds/linux/blob/v5.13/arch/x86/kvm/lapic.c so moving to the latest kernel above 5.13 should resolve this issue.

sudanl0 avatar Jul 11 '23 09:07 sudanl0

Closing this issue. The fix is present in the new Linux kernels. If anyone sees it again, please reopen it.

ShadowCurse avatar Sep 25 '23 10:09 ShadowCurse