firecracker icon indicating copy to clipboard operation
firecracker copied to clipboard

[Bug] wrmsr restore test fails on T2A template

Open wearyzen opened this issue 2 years ago • 0 comments

Describe the bug

wrmsr snapshot and restore tests (test_cpu_wrmsr_snapshot() and test_cpu_wrmsr_restore()) currently support only T2S. While trying to add support for T2A templates the restore test failed with below error:

E Traceback (most recent call last): E File "/firecracker/tests/framework/matrix.py", line 128, in _backtrack E self._run_test_fn(cartesian_product, test_fn) E File "/firecracker/tests/framework/matrix.py", line 154, in _run_test_fn E test_fn(self._context) E File "/firecracker/tests/integration_tests/functional/test_cpu_features.py", line 642, in _test_cpu_wrmsr_restore E dump_msr_state_to_file(msrs_after_fname, ssh_connection, shared_names) E File "/firecracker/tests/integration_tests/functional/test_cpu_features.py", line 378, in dump_msr_state_to_file E assert stderr.read() == "" E AssertionError: assert '/bin/msr_rea...cate memory\n' == '' E + /bin/msr_reader.sh: fork: Cannot allocate memory E + /bin/msr_reader.sh: fork: Cannot allocate memory E

A workaround to avoid the issue is to add a delay of 250ms in test_cpu_wrmsr_snapshot() before taking the snapshot (https://github.com/firecracker-microvm/firecracker/blob/6d43ad7b0d3309721c74e7ecdf637898bcdee655/tests/integration_tests/functional/test_cpu_features.py#L435) but the actual root cause is not known.

To Reproduce

  1. Add T2A in the MSR supported cpu templates list.
  2. In test_cpu_wrmsr_snapshot(), remove any delay between dump_msr_state_to_file() and pause_to_snapshot(). or use below patch on top of PR#3444
diff --git a/tests/integration_tests/functional/test_cpu_features.py b/tests/integration_tests/functional/test_cpu_features.py
index 253ddf3a..3123aa0f 100644
--- a/tests/integration_tests/functional/test_cpu_features.py
+++ b/tests/integration_tests/functional/test_cpu_features.py
@@ -432,7 +432,7 @@ def _test_cpu_wrmsr_snapshot(context):
     # adding delay below as a workaround to unblock the tests for now.
     # TODO: Debug the issue and remove this delay. Create below issue to track this:
     # https://github.com/firecracker-microvm/firecracker/issues/3453
-    time.sleep(0.25)
+    #time.sleep(0.25)
 
     # Take a snapshot
     vm.pause_to_snapshot(

  1. run test with below command:

sudo tools/devtool -y test -- -s -rs -m nonci integration_tests/functional/test_cpu_features.py -k 'test_cpu_wrmsr_snapshot or test_cpu_wrmsr_restore'

Expected behaviour

WRMSR snapshot and restore test should work for T2A template and below command shouldn't return an error:

sudo tools/devtool -y test -- -s -rs -m nonci integration_tests/functional/test_cpu_features.py -k 'test_cpu_wrmsr_snapshot or test_cpu_wrmsr_restore'

Environment

[ - Firecracker version.]: Latest [ - Host and guest kernel versions.]: Issue is reproduced with 4.14 and 5.10 combination of guest and host kernels [ - Rootfs used.]: default rootfs [ - Architecture.]: AMD [ - Any other relevant software versions.]: N/A

Additional context

[Author TODO: How has this bug affected you?]

[Author TODO: What are you trying to achieve?] Add support for T2A template in the wrmsr snapshot/restore pytests.

[Author TODO: Do you have any idea of what the solution might be?] No

Checks

  • [x] Have you searched the Firecracker Issues database for similar problems?
  • [ ] Have you read the existing relevant Firecracker documentation?
  • [ ] Are you certain the bug being reported is a Firecracker issue?

wearyzen avatar Feb 16 '23 15:02 wearyzen