criu icon indicating copy to clipboard operation
criu copied to clipboard

CRIU 3.18: unittest and zdtm test failed on both x86_64 and arm64

Open ancientmodern opened this issue 2 years ago • 6 comments

Description

When running make unittest and make test using criu v3.18 on an arm64 VM, both tests unfortunately failed with error I can hardly understood. The same unittest also failed when attempted on an x86 VM, albeit with identical output. The zdtm test on the x86 VM failed on a different test. The detailed information is attached below. Thanks for looking into this :)

Update: They work well when running the simple loop case here.

Steps to reproduce the issue:

  1. Clone criu source code
  2. git checkout v3.18
  3. make unittest
  4. make test or make zdtm && cd test && ./zdtm.py run -a

Describe the results you received:

unit test:

make[2]: Nothing to be done for 'criu/string.c'.
make[2]: Nothing to be done for 'all'.
Error (criu/config.c:127): Unsupported configuration file format. Please consult man page criu(8)
Error (criu/config.c:238): Unsupported configuration file format. Please consult man page criu(8)
Error (criu/config.c:172): Unsupported configuration file format. Please consult man page criu(8)
Error (criu/config.c:238): Unsupported configuration file format. Please consult man page criu(8)
OK
zdtm test 25 terminated on arm64:

=== Run 25/454 ---------------- zdtm/static/deleted_unix_sock
userns is supported
==================== Run zdtm/static/deleted_unix_sock in h ====================
Start test
./deleted_unix_sock --pidfile=deleted_unix_sock.pid --outfile=deleted_unix_sock.out --filename=deleted_unix_sock.test
make: *** [Makefile:490: deleted_unix_sock.pid] Error 1
 Test zdtm/static/deleted_unix_sock FAIL at ['make', '--no-print-directory', '-C', 'zdtm/static', 'deleted_unix_sock.pid']
Test output: ================================
22:40:23.134:    38: ERR: fs.c:106: access check for bit X for current dir path '/root/criu/test/zdtm/static' failed for uid:18943,gid:58467, error: 13(Permission denied). Bit 'x' should be set in all path components of this directory
22:40:23.135:    38: ERR: unix.c:11: failed to get current working directory with valid permissions.
22:40:23.135:    37: ERR: test.c:320: Test exited unexpectedly with code 1

 <<< ================================
##################################### FAIL #####################################
zdtm test 20 show error on arm64 (but PASS):

=== Run 20/454 ---------------- zdtm/static/timens_nested
userns is supported
===================== Run zdtm/static/timens_nested in uns =====================
Start test
Test is SUID
./timens_nested --pidfile=timens_nested.pid --outfile=timens_nested.out
Run criu dump
=[log]=> dump/zdtm/static/timens_nested/46/1/dump.log
------------------------ grep Error ------------------------
b'(00.000777) Will take mnt namespace in the image'
b'(00.000778) Add mnt ns 14 pid 46'
b'(00.000780) Will take user namespace in the image'
b'(00.000780) Add user ns 15 pid 46'
b"(00.000806) Error (criu/namespaces.c:460): Can't dump nested time namespace for 51"
b"(00.000807) Error (criu/namespaces.c:727): Can't make timens id"
b'(00.000810) net: Unlock network'
b'(00.000811) Running network-unlock scripts'
b'iptables-restore: line 6 failed'
b'(00.001692) Error (criu/util.c:643): exited, status=1'
b'ip6tables-restore: line 6 failed'
b'(00.002581) Error (criu/util.c:643): exited, status=1'
b'(00.002587) Unfreezing tasks into 1'
b'(00.002588) \tUnseizing 46 into 1'
b'(00.002593) \tUnseizing 50 into 1'
b'(00.002599) \tUnseizing 51 into 1'
b'(00.002607) Error (criu/cr-dump.c:2093): Dumping FAILED.'
------------------------ ERROR OVER ------------------------
Send the 15 signal to  46
Wait for zdtm/static/timens_nested(46) to die for 0.100000
Removing dump/zdtm/static/timens_nested/46
===================== Test zdtm/static/timens_nested PASS ======================
====================== Run zdtm/static/timens_nested in h ======================
Start test
Test is SUID
./timens_nested --pidfile=timens_nested.pid --outfile=timens_nested.out
Run criu dump
=[log]=> dump/zdtm/static/timens_nested/91/1/dump.log
------------------------ grep Error ------------------------
b'(00.000658) Collected (4 attempts, 0 in_progress)'
b'(00.000661) Collected 92 in 1 state'
b'(00.000667) Collected (3 attempts, 0 in_progress)'
b'(00.000668) Collected 91 in 1 state'
b"(00.000691) Error (criu/namespaces.c:460): Can't dump nested time namespace for 92"
b"(00.000692) Error (criu/namespaces.c:727): Can't make timens id"
b'(00.000694) net: Unlock network'
b'(00.000695) Unfreezing tasks into 1'
b'(00.000696) \tUnseizing 91 into 1'
b'(00.000706) \tUnseizing 92 into 1'
b'(00.000709) Error (criu/cr-dump.c:2093): Dumping FAILED.'
------------------------ ERROR OVER ------------------------
Send the 15 signal to  91
Wait for zdtm/static/timens_nested(91) to die for 0.100000
Removing dump/zdtm/static/timens_nested/91
===================== Test zdtm/static/timens_nested PASS ======================
===================== Run zdtm/static/timens_nested in ns ======================
Start test
Test is SUID
./timens_nested --pidfile=timens_nested.pid --outfile=timens_nested.out
Run criu dump
=[log]=> dump/zdtm/static/timens_nested/137/1/dump.log
------------------------ grep Error ------------------------
b'(00.000769) Will take time namespace in the image'
b'(00.000770) Add time ns 13 pid 137'
b'(00.000773) Will take mnt namespace in the image'
b'(00.000774) Add mnt ns 14 pid 137'
b"(00.000800) Error (criu/namespaces.c:460): Can't dump nested time namespace for 142"
b"(00.000801) Error (criu/namespaces.c:727): Can't make timens id"
b'(00.000804) net: Unlock network'
b'(00.000804) Running network-unlock scripts'
b'iptables-restore: line 6 failed'
b'(00.001686) Error (criu/util.c:643): exited, status=1'
b'ip6tables-restore: line 6 failed'
b'(00.002563) Error (criu/util.c:643): exited, status=1'
b'(00.002569) Unfreezing tasks into 1'
b'(00.002571) \tUnseizing 137 into 1'
b'(00.002591) \tUnseizing 141 into 1'
b'(00.002596) \tUnseizing 142 into 1'
b'(00.002603) Error (criu/cr-dump.c:2093): Dumping FAILED.'
------------------------ ERROR OVER ------------------------
Send the 15 signal to  137
Wait for zdtm/static/timens_nested(137) to die for 0.100000
Removing dump/zdtm/static/timens_nested/137
===================== Test zdtm/static/timens_nested PASS ======================
zdtm test 93 terminated on x86_64:

=== Run 93/454 ===------------- zdtm/static/socket-tcp-closed-last-ack
userns is supported
=============== Run zdtm/static/socket-tcp-closed-last-ack in ns ===============
Start test
Test is SUID
./socket-tcp-closed-last-ack --pidfile=socket-tcp-closed-last-ack.pid --outfile=socket-tcp-closed-last-ack.out
State     Recv-Q Send-Q Local Address:Port  Peer Address:Port Process
LAST-ACK  1      1          127.0.0.1:8880     127.0.0.1:60974
TIME-WAIT 0      0          127.0.0.1:60974    127.0.0.1:8880
State    Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LAST-ACK 1      1          127.0.0.1:8880    127.0.0.1:60974
Check TCP images
Running zdtm/static/socket-tcp-closed-last-ack.hook(--post-start)
Running zdtm/static/socket-tcp-closed-last-ack.hook(--pre-dump)
Run criu dump
Running zdtm/static/socket-tcp-closed-last-ack.hook(--pre-restore)
Run criu restore
Running zdtm/static/socket-tcp-closed-last-ack.hook(--post-restore)
Send the 15 signal to  94
Wait for zdtm/static/socket-tcp-closed-last-ack(94) to die for 0.100000
####### Test zdtm/static/socket-tcp-closed-last-ack FAIL at result check #######
Test output: ================================
iptables: Bad rule (does a matching rule exist in that chain?).

 <<< ================================
Running zdtm/static/socket-tcp-closed-last-ack.hook(--clean)

CRIU logs and information:

Output of `criu --version`:

root at ubuntu-arm:~/criu/test [(v3.18)]# criu --version
Version: 3.18
GitID: v3.18-42-gf018893d2

haorong at haorong:/scratch/criu-x86 [(v3.18)]$ criu --version
Version: 3.18
GitID: v3.18-31-gc6ee1ba05
Output of `criu check --all`:

root at ubuntu-arm:~/criu/test [(v3.18)]# criu check --all
Warn  (criu/kerndat.c:1153): Stale /run/criu.kdat file
Warn  (criu/cr-check.c:822): Dirty tracking is OFF. Memory snapshot will not work.
Warn  (criu/cr-check.c:1255): Do not have API to map vDSO - will use mremap() to restore vDSO
Warn  (criu/cr-check.c:1175): CRIU built without CONFIG_COMPAT - can't C/R compatible tasks
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

haorong at haorong:/scratch/criu-x86 [(v3.18)]$ sudo criu check --all
Warn  (criu/kerndat.c:1153): Stale /run/criu.kdat file
sudo: mon_handle_sigchld: waitpid: No child processes
Looks good.

Additional environment details: arm64 kernel config: arm64-config-5.15.0-76-generic.txt

x86_64 kernel config: x86_64-config-5.19.0-43-generic.txt

ancientmodern avatar Jul 07 '23 23:07 ancientmodern

Error (criu/config.c:127): Unsupported configuration file format. Please consult man page criu(8)
Error (criu/config.c:238): Unsupported configuration file format. Please consult man page criu(8)
Error (criu/config.c:172): Unsupported configuration file format. Please consult man page criu(8)
Error (criu/config.c:238): Unsupported configuration file format. Please consult man page criu(8)
OK

These "error" messages above are expected. We are testing if CRIU would fail if an invalid configuration format is used. The message OK at the end indicates that all unit tests have passed: https://github.com/checkpoint-restore/criu/blob/a4bb3f9a063cea2aede654317c53b35b608d042e/criu/unittest/unit.c#L146

The same applies for the timens_nested ZDMT test.

b'(00.000709) Error (criu/cr-dump.c:2093): Dumping FAILED.'
------------------------ ERROR OVER ------------------------
Send the 15 signal to  91
Wait for zdtm/static/timens_nested(91) to die for 0.100000
Removing dump/zdtm/static/timens_nested/91
===================== Test zdtm/static/timens_nested PASS ======================

The crfail flag in the timens_nested.desc file is used to specify that we expect CRIU to fail with this test: https://github.com/checkpoint-restore/criu/blob/a4bb3f9a063cea2aede654317c53b35b608d042e/test/zdtm/static/timens_nested.desc#L1

rst0git avatar Jul 08 '23 08:07 rst0git

@rst0git Thanks Radostin! I forgot that some zdtm cases are meant to fail. Makes sense.

Any idea about the test case that actually "failed"? As I run make test under root, the permission error in arm64 seems odd to me. Also I could not quite understand the error in x86_64 :(

ancientmodern avatar Jul 09 '23 19:07 ancientmodern

access check for bit X for current dir path normally that can be fixed by putting your criu directory to /criu instead of /root/criu, e.g. note the difference

[ ~]# ls /criu -ld
drwxr-xr-x 16 root root 4096 Jun  2 05:37 /criu
[ ~]# ls /root -ld
dr-xr-x--- 11 root root 4096 Jul  5 07:47 /root

Snorch avatar Jul 10 '23 04:07 Snorch

access check for bit X for current dir path normally that can be fixed by putting your criu directory to /criu instead of /root/criu, e.g. note the difference

[ ~]# ls /criu -ld
drwxr-xr-x 16 root root 4096 Jun  2 05:37 /criu
[ ~]# ls /root -ld
dr-xr-x--- 11 root root 4096 Jul  5 07:47 /root

Thank you Pavel, it works! Now the arm64 version fails with a similar iptables error to x86_64 in test case 59:

=== Run 59/454 ==-------------- zdtm/static/socket-tcp-syn-sent
userns is supported
================== Run zdtm/static/socket-tcp-syn-sent in uns ==================
Start test
Test is SUID
Running zdtm/static/socket-tcp-syn-sent.hook(--post-start)
./socket-tcp-syn-sent --pidfile=socket-tcp-syn-sent.pid --outfile=socket-tcp-syn-sent.out
Running zdtm/static/socket-tcp-syn-sent.hook(--pre-dump)
State                Recv-Q             Send-Q                         Local Address:Port                            Peer Address:Port             Process
LISTEN               0                  1                                    0.0.0.0:8880                                 0.0.0.0:*
SYN-SENT             0                  1                                  127.0.0.1:59903                              127.0.0.1:8880
Run criu dump
Running zdtm/static/socket-tcp-syn-sent.hook(--pre-restore)
Run criu restore
Running zdtm/static/socket-tcp-syn-sent.hook(--post-restore)
State                Recv-Q             Send-Q                         Local Address:Port                            Peer Address:Port             Process
LISTEN               0                  1                                    0.0.0.0:8880                                 0.0.0.0:*
SYN-SENT             0                  1                                  127.0.0.1:59903                              127.0.0.1:8880
Check TCP images
Send the 15 signal to  112
Wait for zdtm/static/socket-tcp-syn-sent(112) to die for 0.100000
########## Test zdtm/static/socket-tcp-syn-sent FAIL at result check ###########
Test output: ================================
iptables: Bad rule (does a matching rule exist in that chain?).

 <<< ================================
Running zdtm/static/socket-tcp-syn-sent.hook(--clean)
##################################### FAIL #####################################

ancientmodern avatar Jul 10 '23 21:07 ancientmodern

@ancientmodern this error seems real, you can try to add some debug to confirm that iptables rule disappears:

diff --git a/test/zdtm/static/socket-tcp-syn-sent.c b/test/zdtm/static/socket-tcp-syn-sent.c
index 8265d5fb7..a8488688a 100644
--- a/test/zdtm/static/socket-tcp-syn-sent.c
+++ b/test/zdtm/static/socket-tcp-syn-sent.c
@@ -93,9 +93,13 @@ int main(int argc, char **argv)
                return -1;
        }
 
+       system("iptables -w -t filter -L");
+
        test_daemon();
        test_waitsig();
 
+       system("iptables -w -t filter -L");
+
        snprintf(cmd, sizeof(cmd), "iptables -w -t filter --protocol tcp -D INPUT --dport %d -j DROP", port);
        if (system(cmd))
                return -1;

Snorch avatar Jul 11 '23 02:07 Snorch

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Aug 11 '23 00:08 github-actions[bot]