criu
criu copied to clipboard
criu: Add `on-demand` option for task restore
These patches include:
- An extension to support restoring processes with
sys_mmapinstead ofsys_preadvto optimize cases when the restored process does not need all the checkpointed pages. Note that the restoration of anonymous memory areas is left unchanged. The resulting performance improvement can be 30.65%(1.808s vs. 1.254s) for the testzdtm/static/mntns_ghost01. Also, it accelerates other tests, e.g.,zdtm/static/file_locks00(1.807s vs. 1.432s) andzdtm/static/unlink_regular(1.936s vs. 1.321s) on my machine. - This extension can be optionally enabled (i.e., by default disabled) with
criu restore -D dir --on-demand. - Proper helper messages are added so that usage of the extension can be viewed via
criu -h. - The extension has passed all the tests, i.e.,
./zdtm.py run -a.
Signed-off-by: Yuhan Yang [email protected]
Post an issue#1712
Not sure why you opened #1712. We can have all discussions here.
For the missing syscalls: You just have to add them to the syscall tables.
Overall: Have a look at all the CI failures and try to fix them. It still fails at many different places.
Most of them failed because when compiling, the file criu/pie/restore-blob.h would have a line #define restorer_sym__export_restore_task.cold 0x475b. I don't know why this happened cause the file was auto-generated. What I've done is changing some codes in criu/pie/restorer.c, function __export_restore_task . When I test the codes, using gcc 5.4.0 wouldn't produce this problem.
It is a strange error. Never seen the that before. gcc 5.4.0 sounds really old (although our CentOS 7 test uses something even older). Maybe try it with a newer gcc and try to reproduce the error locally. I am not familiar with that part of the code, so that I cannot really help you. Hopefully someone else can help you figure out why the same symbol is generated with and without .cold.
The only idea that comes to my mind, is maybe it is related to something with the stack, because you add a couple of new variables. But that is just a wild uneducated guess.
The only idea that comes to my mind, is maybe it is related to something with the stack, because you add a couple of new variables. But that is just a wild uneducated guess.
Yes. Compiling with the original version won't produce the problem even with gcc9, so it might be something with the stack I guess.
By the way, how can I add a sys_pread syscall in arm? I know that the syscall table is in compel/arch/arm/plugins/std/syscalls/syscall.def, and it has a syscall whose name is sys_pread64, but I don't know what's the meaning of code64 and code32. Is it enough to just add a new line pread with same code32 and code64?
Adding a flag -fno-reorder-blocks-and-partition in criu/Makefile seems to be able to solve the cold symbol problem. But I'm concerned that it might have some performance issues.
ci/circleci: test-local-gcc failed at test zdtm/static/pthread_timers in ns. However, I passed the test on my own machine, so I checked out the log and found out that
=== Run 136/423 =====----------- zdtm/static/pthread_timers
userns is supported
timens isn't supported on 5.4.0-1021-gcp
===================== Run zdtm/static/pthread_timers in ns =====================
Start test
./pthread_timers --pidfile=pthread_timers.pid --outfile=pthread_timers.out
Run criu dump
=[log]=> dump/zdtm/static/pthread_timers/232/1/dump.log
------------------------ grep Error ------------------------
b'(00.001254) seccomp: Collected tid_real 236 mode 0'
b"(00.001285) \tSeizing 236's 237 thread"
b'(00.001381) seccomp: Collected tid_real 237 mode 0'
b"(00.001389) \tSeizing 236's 238 thread"
b'(00.001398) Warn (compel/src/lib/infect.c:126): Unable to interrupt task: 238 (Operation not permitted)'
b"(00.001424) \tSeizing 236's 238 thread"
b'(00.001432) Warn (compel/src/lib/infect.c:126): Unable to interrupt task: 238 (Operation not permitted)'
b"(00.001448) \tSeizing 236's 238 thread"
b'(00.001454) Warn (compel/src/lib/infect.c:126): Unable to interrupt task: 238 (Operation not permitted)'
b"(00.001470) \tSeizing 236's 238 thread"
b'(00.001476) Warn (compel/src/lib/infect.c:126): Unable to interrupt task: 238 (Operation not permitted)'
b"(00.001490) \tSeizing 236's 238 thread"
b'(00.001496) Warn (compel/src/lib/infect.c:126): Unable to interrupt task: 238 (Operation not permitted)'
b"(00.001514) \tSeizing 236's 238 thread"
b'(00.001521) Warn (compel/src/lib/infect.c:126): Unable to interrupt task: 238 (Operation not permitted)'
b'(00.001542) Unlock network'
b'(00.001548) Unfreezing tasks into 1'
b'(00.001552) \tUnseizing 232 into 1'
b'(00.001575) \tUnseizing 236 into 1'
b'(00.001603) Error (criu/cr-dump.c:1788): Dumping FAILED.'
------------------------ ERROR OVER ------------------------
############## Test zdtm/static/pthread_timers FAIL at CRIU dump ###############
Send the 9 signal to 232
Wait for zdtm/static/pthread_timers(232) to die for 0.100000
Considering that I use a Ubuntu 20.04 with a kernel 5.11.0-43-generic, is that timens isn't supported on 5.4.0-1021-gcp caused the test failed?
By the way, I also wonder why ci/circleci: test-local-clang failed. The error info is
make -r -R -f /home/circleci/criu/scripts/nmk/scripts/main.mk makefile=Makefile obj=criu/arch/x86 all
criu/Makefile.packages:36: Can not find some of the required libraries
criu/Makefile.packages:37: Make sure the following packages are installed
criu/Makefile.packages:38: RPM based distros: protobuf protobuf-c protobuf-c-devel protobuf-compiler protobuf-devel protobuf-python libnl3-devel libcap-devel python3-future
criu/Makefile.packages:39: DEB based distros: libprotobuf-dev libprotobuf-c-dev protobuf-c-compiler protobuf-compiler python3-protobuf python3-future libnl-3-dev libcap-dev
criu/Makefile.packages:40: To run tests the following packages are needed
criu/Makefile.packages:41: RPM based distros: libaio-devel python3-PyYAML
criu/Makefile.packages:42: DEB based distros: python3-yaml libaio-dev libaio-dev
criu/Makefile.packages:43: *** Compilation aborted. Stop.
Please do not include merge commits in your pull request. Use rebase to apply your patches on the latest criu-dev branch in your pull request. Also, please no fix up commits like the format fix. Just do the fix in the actual commit changing the code.
The zdtm/static/pthread_timers test failure is a flake and happens sometimes. We have not figured out why it happens. Rerunning CI will fix this CI failure.
The other error criu/Makefile.packages:36: Can not find some of the required libraries is not about packages, but is related to one of your changes. Clang is not happy with your changes: error: unknown argument: '-fno-reorder-blocks-and-partition'
Overall please make sure the Author of each commit matches the Signed-off-by exactly. You are using sometimes multiple Signed-off-by, sometimes they are different. Sometimes the author uses a gmail.com address, sometimes not. Try to be consitent.
Please do not include merge commits in your pull request. Use rebase to apply your patches on the latest
criu-devbranch in your pull request. Also, please no fix up commits like the format fix. Just do the fix in the actual commit changing the code.The
zdtm/static/pthread_timerstest failure is a flake and happens sometimes. We have not figured out why it happens. Rerunning CI will fix this CI failure.The other error
criu/Makefile.packages:36: Can not find some of the required librariesis not about packages, but is related to one of your changes. Clang is not happy with your changes:error: unknown argument: '-fno-reorder-blocks-and-partition'Overall please make sure the Author of each commit matches the
Signed-off-byexactly. You are using sometimes multipleSigned-off-by, sometimes they are different. Sometimes the author uses a gmail.com address, sometimes not. Try to be consitent.
I'll try to fix these after I fix the .cold symbol problem, using flag -fno-reorder-blocks-and-partition works well for gcc, however clang doesn't have this flag. That's a pity, maybe I need more time to think about it. Thank you for your reply!
Codecov Report
Merging #1711 (ac5c87f) into criu-dev (8bddd88) will increase coverage by
0.00%. The diff coverage is60.00%.
@@ Coverage Diff @@
## criu-dev #1711 +/- ##
=========================================
Coverage 70.74% 70.74%
=========================================
Files 125 125
Lines 30952 30959 +7
=========================================
+ Hits 21897 21902 +5
- Misses 9055 9057 +2
| Impacted Files | Coverage Δ | |
|---|---|---|
| criu/config.c | 82.52% <50.00%> (-0.13%) |
:arrow_down: |
| criu/cr-restore.c | 67.28% <66.66%> (+0.19%) |
:arrow_up: |
| criu/uffd.c | 79.36% <0.00%> (-0.48%) |
:arrow_down: |
| criu/page-xfer.c | 67.35% <0.00%> (+0.10%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 8bddd88...ac5c87f. Read the comment docs.
Hmm, it's quite strange. Using clang to compile seemed fine at Compat Tests/ build(CLANG), however at X86_64 CLANG Test / build it showed that using the flag that clang doesn’t have again. But I did add a ifeq in Makefile
ifeq ($(HOSTCC), gcc)
CFLAGS += -fno-reorder-blocks-and-partition
endif
So, is there anything special for X86_64 CLANG Test / build?
Hmm, it's quite strange. Using clang to compile seemed fine at
Compat Tests/ build(CLANG), however atX86_64 CLANG Test / buildit showed that using the flag that clang doesn’t have again. But I did add aifeqinMakefileifeq ($(HOSTCC), gcc) CFLAGS += -fno-reorder-blocks-and-partition endifSo, is there anything special for
X86_64 CLANG Test / build?
I guess the reason might be when building the container, in Dockerfile.x86_64, The ARG CC is gcc. So the flag is just added into the CFLAGS. However, at some point, the CC is replaced by CLANG, but I'm not very sure about that.
Hmm, it's quite strange. Using clang to compile seemed fine at
Compat Tests/ build(CLANG), however atX86_64 CLANG Test / buildit showed that using the flag that clang doesn’t have again. But I did add aifeqinMakefileifeq ($(HOSTCC), gcc) CFLAGS += -fno-reorder-blocks-and-partition endifSo, is there anything special for
X86_64 CLANG Test / build?I guess the reason might be when building the container, in
Dockerfile.x86_64, The ARGCCisgcc. So the flag is just added into theCFLAGS. However, at some point, theCCis replaced byCLANG, but I'm not very sure about that.
The reason is that I used HOSTCC to determine instead of CC.
@Yangfisher1 you can use
git rebaseto edit your patches and combine multiple patches into a single one. When you have multiple patches, the automated tests must pass after each patch, not just after the last one. This allows us to use tools such asgit bisect.
@rst0git Does this mean that I have to merge all the commits into one, or does it mean that the commits corresponding to each patch are merged into one?In this case, I add a new option on-demand of criu. So should I just combine all commits into one?
Does this mean that I have to merge all the commits into one, or does it mean that the commits corresponding to each patch are merged into one? So should I just combine all commits into one?
Not necessarily because large commits are difficult to review. You should have each logical change into a separate commit. However, when dividing your changes into a series of commits, you need to ensure that CRIU builds and runs properly after each commit. This helps when other developers are using git bisect to track down a problem.
You can use as an example some of the previous pull request / commits that have introduced new options to CRIU:
--network-lock: https://github.com/checkpoint-restore/criu/pull/1539--stream: https://github.com/checkpoint-restore/criu/commit/7d79a58f4daa9ddda015fc0aaca2a60d156d12cc--pre-dump-mode: https://github.com/checkpoint-restore/criu/commit/20d4920a8bf74d1eceebc076bcc00889ba40e9f7
There are guidelines on how to divide your changes into a series of commits available in: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#separate-your-changes
Thanks! I'll refine my commits with these examples later.
@rst0git Done.
@avagin I've already fixed the type conversion problem and others. It still need some type conversion to compare the address.
Hmm, it's quite strange because I tried to use make indent and it showed me format like this. However, it still failed the code linter. I don't know why.
Hmm, it's quite strange because I tried to use
make indentand it showed me format like this. However, it still failed the code linter. I don't know why.
Make sure to use the same version of clang-format (13) as our CI.
Hmm, it's quite strange because I tried to use
make indentand it showed me format like this. However, it still failed the code linter. I don't know why.Make sure to use the same version of clang-format (13) as our CI.
My bad. I'm using a clang whose version is 11.0.0.
How do you test your changes? Could you add the on-demand option in zdtm.py and execute tests in CI?
How do you test your changes? Could you add the on-demand option in zdtm.py and execute tests in CI?
I just simply replace line 1616, criu/pie/restorer.c
if (args->on_demand_restore) {
into
if (!args->on_demand_restore) {
and then test these changes. I don't know how to add this on-demand option in zdtm.py. Could you please tell me how to do?
I don't know how to add this on-demand option in zdtm.py. Could you please tell me how to do?
@Yangfisher1 perhaps looking at the git history could help. This commit adds the --pre-dump-mode option to zdtm.py
https://github.com/checkpoint-restore/criu/commit/20d4920a8bf74d1eceebc076bcc00889ba40e9f7#diff-c9838f32b66301675f1aaba9ee24003ac88febf1f77cd993feba0db4b75d7ea6
@rst0git Thank you! I'll try it later.
@avagin Now I add this option into zdtm test. Could you please tell me whether it's enough for CI because I followed the commit 20d4920 ? Do I need to add some extra test for this new feature? I'm not sure how to add new tests.
@Yangfisher1 you might need to update scripts/ci/run-ci-tests.sh to run the CI tests with --on-demand.
@rst0git Get it. I'll update this file.
It seems like the test aarch64 Fedora Rawhide would fail when compiling, quite strange.