firecracker
firecracker copied to clipboard
test(jailer): enable multi-threaded test
Changes
Previously, all tests shared same temporary file/directory, causing concurrency conflicts when running tests in multi-threaded. Resolved test concurrency issues by incorporating random strings into file/directory names.
Reason
Fixes #4412
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.
PR Checklist
- [x] If a specific issue led to this PR, this PR closes the issue.
- [x] The description of changes is clear and encompassing.
- [ ] Any required documentation changes (code and docs) are included in this PR.
- [ ] API changes follow the Runbook for Firecracker API changes.
- [ ] User-facing changes are mentioned in
CHANGELOG.md. - [ ] All added/changed functionality is tested.
- [ ] New
TODOs link to an issue. - [x] Commits meet contribution quality standards.
- [ ] This functionality cannot be added in
rust-vmm.
Codecov Report
Attention: Patch coverage is 96.29630% with 2 lines in your changes missing coverage. Please review.
Project coverage is 82.09%. Comparing base (
6d6caf4) to head (7bd4d35).
| Files | Patch % | Lines |
|---|---|---|
| src/jailer/src/env.rs | 66.66% | 1 Missing :warning: |
| src/jailer/src/main.rs | 0.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #4438 +/- ##
==========================================
+ Coverage 82.08% 82.09% +0.01%
==========================================
Files 255 255
Lines 31257 31296 +39
==========================================
+ Hits 25656 25694 +38
- Misses 5601 5602 +1
| Flag | Coverage Δ | |
|---|---|---|
| 4.14-c5n.metal | 79.59% <96.29%> (+0.02%) |
:arrow_up: |
| 4.14-m5n.metal | 79.57% <96.29%> (+0.02%) |
:arrow_up: |
| 4.14-m6a.metal | 78.80% <96.29%> (+0.02%) |
:arrow_up: |
| 4.14-m6g.metal | 76.63% <96.29%> (+0.03%) |
:arrow_up: |
| 4.14-m6i.metal | 79.57% <96.29%> (+0.01%) |
:arrow_up: |
| 4.14-m7g.metal | 76.63% <96.29%> (+0.03%) |
:arrow_up: |
| 5.10-c5n.metal | 82.11% <96.29%> (+0.02%) |
:arrow_up: |
| 5.10-m5n.metal | 82.09% <96.29%> (+0.01%) |
:arrow_up: |
| 5.10-m6a.metal | 81.41% <96.29%> (+0.02%) |
:arrow_up: |
| 5.10-m6g.metal | 79.40% <96.29%> (+0.02%) |
:arrow_up: |
| 5.10-m6i.metal | 82.09% <96.29%> (+0.02%) |
:arrow_up: |
| 5.10-m7g.metal | 79.40% <96.29%> (+0.02%) |
:arrow_up: |
| 6.1-c5n.metal | 82.11% <96.29%> (+0.02%) |
:arrow_up: |
| 6.1-m5n.metal | 82.10% <96.29%> (+0.02%) |
:arrow_up: |
| 6.1-m6a.metal | 81.41% <96.29%> (+0.02%) |
:arrow_up: |
| 6.1-m6g.metal | 79.39% <96.29%> (+0.01%) |
:arrow_up: |
| 6.1-m6i.metal | 82.09% <96.29%> (+0.02%) |
:arrow_up: |
| 6.1-m7g.metal | 79.40% <96.29%> (+0.02%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hello @cm-iwata ,
Thank you for your contribution! Could you please fix clippy test errors? https://buildkite.com/firecracker/firecracker-pr/builds/8932#018d8d15-d42f-4863-969e-4df250266426/55-402
@zulinx86 Sorry. fixed it.
@cm-iwata
Could you please do the following things?:
- Add a commit message explaining the context like why this commit is needed ( https://buildkite.com/firecracker/firecracker-pr/builds/8976#018d9cd9-1a22-4508-b4bb-f1da0279335a/55-182 )
- Fix all the unit test failures ( https://buildkite.com/firecracker/firecracker-pr/builds/8976#018d9cd9-1a2e-407f-b159-6eca360663da )
- Post the result here when running it in parallel as instructed in #4412 since we cannot still run all the rust unit tests in parallel at the moment.
Thanks!
It's a matter of taste, but it might be better to use "test(jailer)" rather than "fix(jailer)". "fix" may look like fixing a bug of jailer. I believe it's like improvement of jailer unit tests.
@zulinx86 Thank you for the review!
Add a commit message explaining the context like why this commit is needed
fixed.
Fix all the unit test failures
I believe I've probably fixed the relevant parts. Since the tests were passing in my dev environment before the fix, I haven't been able to confirm correctly that the fix made the tests pass. Also, since some unit tests are failing in my dev environment, I haven't been able to confirm that all unit tests are passing. Could you please run CI?
※I use GCEfor dev environment.
Post the result here when running it in parallel as instructed in Fix running cargo test in parallel for vmm #4412 since we cannot still run all the rust unit tests in parallel at the moment.
here is result.
root@dbaa0c959c45:/firecracker# cargo test --package jailer
warning: patch for `kvm-bindings` uses the features mechanism. default-features and features will not take effect because the patch dependency does not support this mechanism
Finished test [unoptimized + debuginfo] target(s) in 0.10s
Running unittests src/main.rs (build/cargo_target/debug/deps/jailer-2ccd13efbd56ca77)
running 32 tests
test cgroup::tests::test_cgroup_builder_no_mounts ... ok
test cgroup::tests::test_get_controller ... ok
test env::tests::test_dup2 ... ok
test env::tests::test_join_netns ... ok
test env::tests::test_parse_resource_limits ... ok
test cgroup::tests::test_inherit_from_parent ... ok
test env::tests::test_copy_exec_to_chroot ... ok
test resource_limits::tests::test_default_resource_limits ... ok
test env::tests::test_validate_exec_file ... ok
test cgroup::tests::test_cgroup_builder_v2 ... ok
test resource_limits::tests::test_display_resource ... ok
test cgroup::tests::test_cgroup_builder_v1_with_v2_mounts ... ok
test resource_limits::tests::test_from_resource ... ok
test resource_limits::tests::test_install ... ok
test resource_limits::tests::test_set_resource_limits ... ok
test tests::test_clean_env_vars ... ok
test env::tests::test_save_exec_file_pid ... ok
test cgroup::tests::test_cgroup_builder_v1 ... ok
test cgroup::tests::test_cgroup_builder_v2_with_v1_mounts ... ok
test tests::test_to_cstring ... ok
test env::tests::test_copy_cache_info ... ok
test env::tests::test_cgroups_parsing ... ok
test env::tests::test_setup_jailed_folder ... ok
test env::tests::test_userfaultfd_dev ... ok
test env::tests::test_mknod_and_own_dev ... ok
test cgroup::tests::test_cgroup_build ... ok
test cgroup::tests::test_cgroup_build_invalid ... ok
test cgroup::tests::test_cgroup_v2_write_value ... ok
test env::tests::test_new_env ... ok
test tests::test_fds_close_range ... ok
test tests::test_sanitize_process ... ok
test tests::test_fds_proc ... ok
test result: ok. 32 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.12s
@cm-iwata Could you please squash 3134792ac78cd256b9b3b0eeebc4dfb6b44d5f87 and 33ced7453eb12edc785cb4c3a9e5d58a0cf1ed04 into one commit? Two identical commits are shown in https://github.com/firecracker-microvm/firecracker/pull/4438/commits.
Could you please squash 3134792 and 33ced74 into one commit?
Sorry. I made a mistake while rebasing. I squashed tow commits.
I thought the code was functioning correctly after the fix, but I've noticed occasional test failures. Could you please advise on the following issues?
1.Too many open files
Sometimes tests fail as follows.
---- tests::test_fds_proc stdout ----
thread 'tests::test_fds_proc' panicked at src/jailer/src/main.rs:409:33:
called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
---- tests::test_fds_close_range stdout ----
thread 'tests::test_fds_close_range' panicked at src/jailer/src/main.rs:409:33:
called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- tests::test_sanitize_process stdout ----
thread 'tests::test_sanitize_process' panicked at src/jailer/src/main.rs:409:33:
called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
The code causing this issue can be found here.
https://github.com/firecracker-microvm/firecracker/blob/b2eee1924637d659f551bb397cc4f427598d71b4/src/jailer/src/main.rs#L405-L408
When running the tests, 100 files are create per thread, it causing the file descriptor limit to be exceeded when running tests in multithread. Changing the loop condition n from 100 to 10 resolved the issue. Is there any specific reason for using the number 100? If not, would it be acceptable to change n to 10?
2. Bad file descriptor
Even after fixing 1, tests may still fail.
---- env::tests::test_new_env stdout ----
thread 'env::tests::test_new_env' panicked at src/jailer/src/env.rs:911:14:
This another new environment should be created successfully.: ReadLine("/tmp/firecracker/test/MCzQ/jailer/proc/mounts", Os { code: 9, kind: Uncategorized, message: "Bad file descriptor" })
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I believe the cause is the following code.
https://github.com/firecracker-microvm/firecracker/blob/b2eee1924637d659f551bb397cc4f427598d71b4/src/jailer/src/main.rs#L442-L445
The error should occur in the following sequence:
- thread1 opens fd4.
- thread2 opens fd5.
- thread1 closes fd4 and fd5.
- thread 2 access to fd5, then Bad file descriptor error will occur
Commenting out the above test cases allows the tests to run successfully. Do you have any suggestions for how to fix the code? I couldn't think of any good ideas, so I would appreciate any advice.
Hi @cm-iwata ,
I really apologize for my very late response :( I was overwhelmed with other prioritized tasks...
I'll definitely come back to this PR soon (at least during this week). I appreciate your understanding and patience.
Please let me know if you came up with something for the bad file descriptor issue.
Thanks,
Hi @cm-iwata,
The change itself looks good to me. Could you squash all the commits into one, because the later 4 commits are refactoring for the first one?
Regarding the "Too many open files" issue, I found a pretty weird behavior of Rust test. I inserted a code running ulimit -n at the beginning of run_close_fds_test() like this.
let output = std::process::Command::new("sh")
.arg("-c")
.arg(format!("ulimit -n"))
.output()
.expect("Failed to run shell command");
let stdout = String::from_utf8_lossy(&output.stdout);
println!("ulimit -n: {stdout}");
When it succeeds, the value is 2047. On the other hand, when failing, it is 100.
# cargo test -p jailer -- --nocapture
...
ulimit -n: 2047
ulimit -n: 2047
...
test result: ok. 30 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.27s
# cargo test -p jailer -- --nocapture
...
ulimit -n: 100
ulimit -n: 100
...
thread 'thread 'tests::test_sanitize_processtests::test_fds_close_range' panicked at ' panicked at src/jailer/src/main.rssrc/jailer/src/main.rs::418418::3333:
:
called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test tests::test_sanitize_process ... FAILED
test tests::test_fds_close_range ... FAILED
...
I'm investigating the root cause of it...
Regarding the second issue (Bad file descriptor), I'm not able to reproduce the issue on my side. Could you provide me reproduction steps?
Thanks,
Now I found the root cause of different open file limit numbers between runs. Jailer has another test (test_install()) that imposes a new limit (100). If the test is concurrently running with them, those tests will fail because they open 100 files (+ 3 (stdin / stdout / stderr)). We can increase the limit of test_install() to 1000 since its default value is 2048 - 1.
https://github.com/firecracker-microvm/firecracker/blob/a364da806f8093e8d8ab1a8287be4a0efd4e4658/src/jailer/src/resource_limits.rs#L196
https://github.com/firecracker-microvm/firecracker/blob/a364da806f8093e8d8ab1a8287be4a0efd4e4658/src/jailer/src/resource_limits.rs#L12