firecracker icon indicating copy to clipboard operation
firecracker copied to clipboard

[WIP] fix(vmm): call KVMCLOCK_CTRL when pausing

Open kalyazin opened this issue 1 year ago • 1 comments

Changes

Call KVM_KVMCLOCK_CTRL when pausing. Related: https://github.com/firecracker-microvm/firecracker/issues/1859 TODO (if merging): update changelog TODO (if merging): doc update

Reason

This is to avoid guest kernel panic on resume path due to softlockup detection.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check CONTRIBUTING.md.

PR Checklist

  • [x] If a specific issue led to this PR, this PR closes the issue.
  • [x] The description of changes is clear and encompassing.
  • [ ] Any required documentation changes (code and docs) are included in this PR.
  • ~~[ ] API changes follow the Runbook for Firecracker API changes.~~
  • [ ] User-facing changes are mentioned in CHANGELOG.md.
  • [x] All added/changed functionality is tested.
  • ~~[ ] New TODOs link to an issue.~~
  • [x] Commits meet contribution quality standards.

  • [x] This functionality cannot be added in rust-vmm.

kalyazin avatar Feb 19 '24 12:02 kalyazin

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.08%. Comparing base (0b9cf39) to head (d496d05). Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4460   +/-   ##
=======================================
  Coverage   84.07%   84.08%           
=======================================
  Files         251      251           
  Lines       28052    28060    +8     
=======================================
+ Hits        23586    23594    +8     
  Misses       4466     4466           
Flag Coverage Δ
5.10-c5n.metal 84.71% <100.00%> (-0.01%) :arrow_down:
5.10-m5n.metal 84.69% <100.00%> (-0.01%) :arrow_down:
5.10-m6a.metal 84.00% <100.00%> (-0.01%) :arrow_down:
5.10-m6g.metal 80.70% <100.00%> (+<0.01%) :arrow_up:
5.10-m6i.metal 84.69% <100.00%> (-0.01%) :arrow_down:
5.10-m7g.metal 80.70% <100.00%> (+<0.01%) :arrow_up:
6.1-c5n.metal 84.71% <100.00%> (-0.01%) :arrow_down:
6.1-m5n.metal 84.69% <100.00%> (-0.01%) :arrow_down:
6.1-m6a.metal 84.00% <100.00%> (-0.01%) :arrow_down:
6.1-m6g.metal 80.70% <100.00%> (+<0.01%) :arrow_up:
6.1-m6i.metal 84.69% <100.00%> (+<0.01%) :arrow_up:
6.1-m7g.metal 80.70% <100.00%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 19 '24 12:02 codecov[bot]

I created a test that should trigger the guest kernel to detect lockups. I will let the CI run once, so we can make sure that the test works on our CI and then reapply the commits that set the KVM_KVMCLOCK_CTRL bit when we pause vCPUs. These should make the failure go away.

bchalios avatar Oct 22 '24 14:10 bchalios

The test does trigger the lockup: https://buildkite.com/firecracker/firecracker-pr/builds/11546.

bchalios avatar Oct 22 '24 14:10 bchalios

I've pushed again Nikita's commits that add the call the KVM_KVMCLOCK_CTRL ioctl. Tests should be fixed now.

bchalios avatar Oct 22 '24 15:10 bchalios

The last commit is not related with PR per se. However, it fixes an intermittent issue in the CI which I was hitting in this PR's pipelines runs and I was too lazy to open a separate PR.

bchalios avatar Oct 23 '24 11:10 bchalios