criu
criu copied to clipboard
issues in mips version
i was able to build criu but i could not dump process. When I run criu check --all, there were no first catagory errors. My enviroments are listed below: cpu : loongson 3A4000 kernel 4.19.0-12-loongson-3 criu version:3.15
Hi,
please provide dump.log and show how you issuing criu command (all arguments).
I issued criu like this "sudo criu/criu/criu dump --shell-job -v4 -o dump.log -t 24987 -D imgs" dump.log
According to your dump.log this seems to be the problem:
(00.017056) Error (criu/parasite-syscall.c:88): si_code=4 si_pid=24987 si_status=10
(00.017064) Error (criu/parasite-syscall.c:95): 24987 was stopped by 10 unexpectedly
The mips support was done by @sunny868 (if I remember it correctly), maybe @sunny868 knows why it does not work.
Looks like your process (pid=24987) have recieved SIGBUS signal:
$ cat mips/include/uapi/asm/signal.h | grep 10
#define SIGBUS 10 /* BUS error (4.2 BSD). */
It may mean that the CRIU issued unaligned access to the memory.
What's the kind of process you've been trying to dump? Do you have the same problem with any another processes that you been trying to dump? It's important for us to understand if CRIU doesn't work at all for you or you just have the problem with particular program.
Is it possible for you to provide access to your MIPS machine for us to take a look on that and try to debug?
@Aatrox00 do you see this error with CRIU 3.16 as well?
It looks like victim process crashes here:
static int parasite_init_daemon(struct parasite_ctl *ctl)
{
...
if (prepare_tsock(ctl, pid, args))
goto err;
/* after this we can catch parasite errors in chld handler */
if (setup_child_handler(ctl)) <-- ok, because we have chld handler called
goto err;
regs = ctl->orig.regs;
if (parasite_run(pid, PTRACE_CONT, ctl->parasite_ip, ctl->rstack, ®s, &ctl->orig)) <-- SIGBUS after jumping into parasite blob
goto err;
futex_wait_while_eq(&args->daemon_connected, 0);
@Aatrox00,
couldn't you try to revert commit ("compel: don't mmap parasite as RWX"), rebuild CRIU (please use make clean && make to perform full rebuild including parasite blob) and run criu dump ...?
you
@Aatrox00 do you see this error with CRIU 3.16 as well?
yeah the same
Looks like your process (pid=24987) have recieved SIGBUS signal:
$ cat mips/include/uapi/asm/signal.h | grep 10 #define SIGBUS 10 /* BUS error (4.2 BSD). */It may mean that the CRIU issued unaligned access to the memory.
What's the kind of process you've been trying to dump? Do you have the same problem with any another processes that you been trying to dump? It's important for us to understand if CRIU doesn't work at all for you or you just have the problem with particular program.
Is it possible for you to provide access to your MIPS machine for us to take a look on that and try to debug?
The process i tried to dump is just a simple single threaded program. The same program works on my x86 machine. Acturally,i've tried several different programs to dump, but none of them worked. As for providing access to the machine, since it doesn't own a public ip address, it cant be accessed through ssh.
@Aatrox00
As for providing access to the machine, since it doesn't own a public ip address, it cant be accessed through ssh.
That's not a problem for us ;) We can setup reverse ssh tunnel from your machine to some machine controlled by CRIU devs as an option. But let's try to make some initial guess and surround the problem before taking extraordinary measures :)
I repeat my question:
couldn't you try to revert commit ("compel: don't mmap parasite as RWX"), rebuild CRIU (please use
make clean && maketo perform full rebuild including parasite blob) and run criu dump ...?
couldn't you try to revert commit ("compel: don't mmap parasite as RWX"), rebuild CRIU (please use
make clean && maketo perform full rebuild including parasite blob) and run criu dump ...?
I tried this just now. It didnt work. The dump.log is just the same as before.
Ok then I will try to reproduce this on Qemu VM.
Upd.
@Aatrox00 which GNU/Linux distro you've used on your mips machine?
Ok then I will try to reproduce this on Qemu VM.
Upd.
@Aatrox00 which GNU/Linux distro you've used on your mips machine?
Thx for your help. I am using Loongnix-20.mips64el.rc2(http://ftp.loongnix.cn/os/loongnix/20/mips64el/isos/) I've also tried debian with kernel version 5.10.64
Hi @Aatrox00,
I've experimented with MIPS in VM on amd64. Sigh. :)
First of all, qemu-system-mips64el -cpu Loongson-3A4000 doesn't work for me at all (it doesn't start kernel boot).
Ok,
qemu-system-mips64el \
-cdrom debian-11.0.0-mipsel-netinst.iso \
-hda disk_malta.qcow2 \
-M malta \
-cpu 5KEc \
-smp 1 \
-kernel vmlinuz-5.10.0-8-5kc-malta \
-boot d \
-initrd initrd.img-5.10.0-8-5kc-malta \
-m 2G \
-nographic \
-device virtio-net-pci,netdev=eth0 -netdev type=user,id=eth0,hostfwd=tcp::2222-:22 \
-virtfs local,path=.,mount_tag=host0,security_model=mapped,id=host0 \
-append "root=/dev/sda1 nokaslr"
worked for me, but CRIU compilation took about 20 minutes. I also caught:
[19119.370910] do_page_fault(): sending SIGSEGV to compel-host-bin for invalid read access from 0000000000000000
[19119.371543] epc = 000000fff39409e0 in libc-2.31.so[fff388d000+1b5000]
[19119.371846] ra = 000000fff3920adc in libc-2.31.so[fff388d000+1b5000]
[19144.846684] do_page_fault(): sending SIGSEGV to compel-host-bin for invalid read access from 0000000000000000
[19144.849278] epc = 000000fff36ce9e0 in libc-2.31.so[fff361b000+1b5000]
[19144.850421] ra = 000000fff36aeadc in libc-2.31.so[fff361b000+1b5000]
[19397.138507] do_page_fault(): sending SIGSEGV to compel-host-bin for invalid read access from 0000000000000000
[19397.139118] epc = 000000fff3ab69e0 in libc-2.31.so[fff3a03000+1b5000]
[19397.139562] ra = 000000fff3a96adc in libc-2.31.so[fff3a03000+1b5000]
Seems like something is totally wrong with the compel.
Perhaps it's better to move to our second plan with using hardware node to debug problem or wait when @sunny868 comes and save us :) From tomorrow I will be on vacation with (possibly) poor internet for about 10 days. So I can try to take a look on your problem today or... after vacation.
Thanks, Alex
@mihalicyn @Aatrox00 Sorry, I have something else to do recently, I will check this problem as soon as possible.
[19119.370910] do_page_fault(): sending SIGSEGV to compel-host-bin for invalid read access from 0000000000000000 [19119.371543] epc = 000000fff39409e0 in libc-2.31.so[fff388d000+1b5000] [19119.371846] ra = 000000fff3920adc in libc-2.31.so[fff388d000+1b5000] [19144.846684] do_page_fault(): sending SIGSEGV to compel-host-bin for invalid read access from 0000000000000000 [19144.849278] epc = 000000fff36ce9e0 in libc-2.31.so[fff361b000+1b5000] [19144.850421] ra = 000000fff36aeadc in libc-2.31.so[fff361b000+1b5000] [19397.138507] do_page_fault(): sending SIGSEGV to compel-host-bin for invalid read access from 0000000000000000 [19397.139118] epc = 000000fff3ab69e0 in libc-2.31.so[fff3a03000+1b5000] [19397.139562] ra = 000000fff3a96adc in libc-2.31.so[fff3a03000+1b5000] Thanks again for your help. I got same error messages on the Loonson 3A4000 machine.
Hi @mihalicyn, how's your vacation? It's been ten days since we exchanged messages last time. I'm wondering when it's a suitable time for you to help me to debug on the hardware node? Thanks.
Hi @Aatrox00, I've returned from vacation :) All fine.
Sure, I'm ready to take a look. We can contact in our Gitter https://gitter.im/save-restore/CRIU or Google Hangouts, email and so on. My e-mail is [email protected] (google hangouts has the same address).
Thanks to @Aatrox00 for providing a working node.
I've managed to reproduce the issue and it looks like our MIPS support is don't work at all for loongson 3A4000 processors.
1st problem (almost obvious):
./compel/compel cflags is crashed with Segmentation fault
The problem is that we have no cflags field initialization here
https://github.com/checkpoint-restore/criu/blob/criu-dev/compel/src/main.c#L57
#elif defined CONFIG_S390
.arch = "s390",
.cflags = COMPEL_CFLAGS_PIE,
#elif defined CONFIG_MIPS
.arch = "mips", <--- we have to have at least cflags
#else
#error "CONFIG_<ARCH> not defined, or unsupported ARCH"
#endif
};
and we are crashing there: https://github.com/checkpoint-restore/criu/blob/criu-dev/compel/src/main.c#L174
printf("%s\n", compat ? flags.cflags_compat : flags.cflags);
Okay, this is fixed and I've moved to the next step. I've tried to play with "fdspy" compel example:
lx@lx-pc:~/criu-3.16.1/compel/test/fdspy$ make
gcc -O2 -g -Wall -Werror -I/home/lx/criu-3.16.1/include/ -o victim victim.c
gcc -O2 -g -Wall -Werror -I/home/lx/criu-3.16.1/include/ -c -o parasite.o parasite.c
ld -r -z noexecstack -T ../../../compel/arch/mips/scripts/compel-pack.lds.S -o parasite.po parasite.o ../../../compel/plugins/std.lib.a ../../../compel/plugins/fds.lib.a
ld: ../../../compel/plugins/std.lib.a(parasite-head.o): warning: linking abicalls files with non-abicalls files
ld: ../../../compel/plugins/std.lib.a(infect.o): warning: linking abicalls files with non-abicalls files
ld: ../../../compel/plugins/std.lib.a(syscalls-64.o): warning: linking abicalls files with non-abicalls files
ld: ../../../compel/plugins/std.lib.a(fds.o): warning: linking abicalls files with non-abicalls files
ld: ../../../compel/plugins/std.lib.a(log.o): warning: linking abicalls files with non-abicalls files
ld: ../../../compel/plugins/std.lib.a(string.o): warning: linking abicalls files with non-abicalls files
ld: ../../../compel/plugins/std.lib.a(memcpy.o): warning: linking abicalls files with non-abicalls files
ld: ../../../compel/plugins/fds.lib.a(fds.o): warning: linking abicalls files with non-abicalls files
../../../compel/compel-host hgen -o parasite.h -f parasite.po
Error (compel/arch/mips/src/lib/handle-elf-host.c:20): Unsupported Elf format detected
make: *** [Makefile:26: parasite.h] Error 255
Let's look at the code:
static const unsigned char __maybe_unused elf_ident_64_le[EI_NIDENT] = {
0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, /* clang-format */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};
extern int __handle_elf(void *mem, size_t size);
int handle_binary(void *mem, size_t size)
{
if (memcmp(mem, elf_ident_64_le, sizeof(elf_ident_64_le)) == 0)
return __handle_elf(mem, size);
pr_err("Unsupported Elf format detected\n");
return -EINVAL;
}
lx@lx-pc:~/criu-3.16.1/compel/test/fdspy$ readelf --header parasite.po
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 01 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 1
Type: REL (Relocatable file)
Machine: MIPS R3000
Version: 0x1
Entry point address: 0x100
Start of program headers: 0 (bytes into file)
Start of section headers: 52672 (bytes into file)
Flags: 0x80000005, noreorder, cpic, mips64r2
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 8
Section header string table index: 7
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .MIPS.abiflags MIPS_ABIFLAGS 0000000000000000 00000040
0000000000000018 0000000000000018 A 0 0 8
[ 2] .text PROGBITS 0000000000000100 00000100
0000000000009350 0000000000000000 WAX 0 0 256
[ 3] .rela.text RELA 0000000000000000 0000adc8
0000000000001fb0 0000000000000018 I 5 2 8
[ 4] .mdebug.abi64 PROGBITS 0000000000000000 00009450
0000000000000000 0000000000000000 0 0 1
[ 5] .symtab SYMTAB 0000000000000000 00009450
0000000000001050 0000000000000018 6 33 8
[ 6] .strtab STRTAB 0000000000000000 0000a4a0
0000000000000927 0000000000000000 0 0 1
[ 7] .shstrtab STRTAB 0000000000000000 0000cd78
0000000000000043 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
There are no program headers in this file.
lx@lx-pc:~/criu-3.16.1/compel/test/fdspy$ hexdump -C parasite.po
00000000 7f 45 4c 46 02 01 01 00 01 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 08 00 01 00 00 00 00 01 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 c0 cd 00 00 00 00 00 00 |................|
00000030 05 00 00 80 40 00 00 00 00 00 40 00 08 00 07 00 |....@.....@.....|
00000040 00 00 40 02 02 02 00 01 00 00 00 00 00 00 00 00 |..@.............|
00000050 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
We can see that on the offset 8 (starting from 0) we have 01 but it should be 00.
First 16-bytes of the file belongs to the elfhdr e_ident array.
typedef struct elfhdr{
unsigned char e_ident[EI_NIDENT]; /* ELF Identification */
Elf32_Half e_type; /* object file type */
Elf32_Half e_machine; /* machine */
Elf32_Word e_version; /* object file version */
Elf32_Addr e_entry; /* virtual entry point */
Elf32_Off e_phoff; /* program header table offset */
Elf32_Off e_shoff; /* section header table offset */
Elf32_Word e_flags; /* processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size */
Elf32_Half e_phentsize; /* program header entry size */
Elf32_Half e_phnum; /* number of program header entries */
Elf32_Half e_shentsize; /* section header entry size */
Elf32_Half e_shnum; /* number of section header entries */
Elf32_Half e_shstrndx; /* section header table's "section
header string table" entry offset */
} Elf32_Ehdr;
I've tried to understand what does means this 01 and not found anything about it the Linux kernel code or somewhere else.
According to the documentation all bytes after offset 8 are padding and should be zero. Okay, I've patched this byte "by hands" and tried to run make for a second time and get:
lx@lx-pc:~/criu-3.16.1/compel/test/fdspy$ ../../../compel/compel-host hgen -o parasite.h -f parasite.po
Error (compel/src/lib/handle-elf-host.c:641): Unsupported relocation of type 7
relocation type 7 is R_MIPS_GPREL16 and we really not handle this relocation type in the compel.
At this point I can't understand how MIPS support worked before? If we take that cflags field wasn't initialized and ./compel/compel cflags crashes in this case. It means that during parasite compilation we got something like:
gcc -O2 -g -Wall -Werror -c -o parasite.o parasite.c
instead of
gcc -O2 -g -Wall -Werror -c -Wstrict-prototypes -fno-stack-protector -nostdlib -fomit-frame-pointer -fpie -I ../../../compel/include/uapi -o parasite.o parasite.c
I can't imagine the situation when parasite compiled without -nostdlib flag is working correctly.
Hmm, interesting. We currently have a simple test for cross compilation (.github/workflows/cross-compile.yml), perhaps we need to extend it (or create one) to run zdtm tests as well.
Unfortunately, we have no our own MIPS node. Aatrox00 provided his own node for debugging temporarily.
A friendly reminder that this issue had no activity for 30 days.