sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

Support for 32bit applications in a 64bit container

Open uazo opened this issue 3 years ago • 14 comments

Hi @ctalledo,

i am encountering another problem. here you can see a container that with sysbox produces Bad system call (core dumped) when running ./bytecode_builtins_list_generator.

that app is a 32 bit app and my container is 64 bit

file ./bytecode_builtins_list_generator

./bytecode_builtins_list_generator: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), 
dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, 
BuildID[sha1]=e3f1ebe53993fc8339b9686b316830ea4b64452a, with debug_info, not stripped

Steps to reproduce:

  • sudo DOCKER_BUILDKIT=1 docker build -t uazo/test32bit .
  • sudo docker run --runtime=sysbox-runc -ti --rm uazo/test32bit
  • ./bytecode_builtins_list_generator

running it without sysbox works perfectly. is there any way to enable with sysbox 32bit application support in a 64bit container?

thank you

uazo avatar Jul 14 '21 06:07 uazo

@uazo, i was able to reproduce the issue but haven't figured out its root-cause yet. I also noticed that problem is not seen when relying on the oci-runc (with and without user-ns).

Problem seems to be related to a recvfrom() syscall (id=45) executed as part of this binary and prevented (apparently) by kernel's seccomp module. Please verify that you are also seeing this in your journald:

Jul 15 00:43:47 ubuntu-focal-vm audit[6192]: SECCOMP auid=4294967295 uid=165536 gid=165536 ses=4294967295 pid=6192 comm="bytecode_builti" exe="/bytecode_builtins_list_generator" sig=31 arch=40000003 syscall=45 compat=1 ip=0xf7f70e3b code=0x0
Jul 15 00:43:47 ubuntu-focal-vm kernel: audit: type=1326 audit(1626309827.690:18): auid=4294967295 uid=165536 gid=165536 ses=4294967295 pid=6192 comm="bytecode_builti" exe="/bytecode_builtins_list_generator" sig=31 arch=40000003 syscall=45 compat=1 ip=0xf7f70e3b code=0x0

rodnymolina avatar Jul 15 '21 00:07 rodnymolina

Strace capture. Crash is triggered early on, right after execve() + brk() execution:

[pid  6320] rt_sigaction(SIGXFSZ, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feb88eed210}, NULL, 8) = 0
[pid  6320] rt_sigaction(SIGVTALRM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feb88eed210}, NULL, 8) = 0
[pid  6320] rt_sigaction(SIGUSR1, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feb88eed210}, NULL, 8) = 0
[pid  6320] rt_sigaction(SIGUSR2, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feb88eed210}, NULL, 8) = 0
[pid  6320] rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feb88eed210}, {sa_handler=0x5598d0380b30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feb88eed210}, 8) = 0
[pid  6320] rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feb88eed210}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feb88eed210}, 8) = 0
[pid  6320] rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feb88eed210}, {sa_handler=0x5598d0380610, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7feb88eed210}, 8) = 0
[pid  6320] rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7feb88eed210}, {sa_handler=0x5598d0363aa0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7feb88eed210}, 8) = 0
[pid  6320] execve("./bytecode_builtins_list_generator", ["./bytecode_builtins_list_generat"...], 0x5598d0f593b0 /* 8 vars */) = 0
strace: [ Process PID=6320 runs in 32 bit mode. ]
[pid  6320] brk(NULL)                   = ?
[pid  6320] +++ killed by SIGSYS (core dumped) +++
<... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGSYS && WCOREDUMP(s)}], WSTOPPED|WCONTINUED, NULL) = 58
rt_sigprocmask(SIG_BLOCK, [CHLD TSTP TTIN TTOU], [CHLD], 8) = 0
ioctl(255, TIOCSPGRP, [1])              = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
ioctl(255, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(255, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0
ioctl(255, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(255, TIOCGWINSZ, {ws_row=70, ws_col=239, ws_xpixel=0, ws_ypixel=0}) = 0
write(2, "Bad system call (core dumped)\n", 30) = 30
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

rodnymolina avatar Jul 15 '21 01:07 rodnymolina

thanks @rodnymolina for investigating!

I confirm:

Jul 15 05:50:07 ay audit[547883]: SECCOMP auid=4294967295 uid=165536 gid=165536 ses=4294967295 
pid=547883 comm="bytecode_builti" exe="/bytecode_builtins_list_generator" sig=31 arch=40000003 syscall=45 
compat=1 ip=0xf7f51e3b code=0x0

Jul 15 05:50:07 ay kernel: audit: type=1326 audit(1626328207.689:1679): auid=4294967295 uid=165536 
gid=165536 ses=4294967295 pid=547883 comm="bytecode_builti" exe="/bytecode_builtins_list_generator"
sig=31 arch=40000003 syscall=45 compat=1 ip=0xf7f51e3b code=0x0

in your opinion, for what you've seen so far, can it be fixed? if it could be fixed, waiting for the fix, is there any temporary workaround (even at the expense of security, I am in the test phase for now) to be able to continue my work?

Strace capture. Crash is triggered early on, right after execve() + brk() execution:

it's beyond my capabilities, but if I can help you in any way, please tell me. I don't think it will be useful to you, but here you find the sources

uazo avatar Jul 15 '21 06:07 uazo

I have the same problem, but I have slightly different strace output so I thought I'd share it here

$ strace /tmp/32bin
execve("/tmp/32bin", ["/tmp/32bin"], 0x7fff270578b0 /* 64 vars */) = 0
strace: [ Process PID=45108 runs in 32 bit mode. ]
set_thread_area({entry_number=-1, base_addr=0x9a85810, limit=0x0fffff, seg_32bit=1, contents=0, read_exec_only=0, limit_in_pages=1, seg_not_present=0, useable=1} <unfinished ...>) = ?
+++ killed by SIGSYS +++
zsh: invalid system call  strace /tmp/32bin
Jan 29 01:03:37 gke-master-sydney-pool-3-adc98a83-so64 audit[3565791]: SECCOMP auid=101000 uid=101000 gid=101000 ses=276 pid=3565791 comm="32bin" exe="/tmp/32bin" sig=31 arch=40000003 syscall=243 compat=1 ip=0x80aec82 code=0x0
Jan 29 01:03:37 gke-master-sydney-pool-3-adc98a83-so64 audit[3565791]: ANOM_ABEND auid=101000 uid=101000 gid=101000 ses=276 pid=3565791 comm="32bin" exe="/tmp/32bin" sig=31 res=1

deansheather avatar Jan 29 '22 01:01 deansheather

@rodnymolina, do you know if there is a workaround and/or plans to fix this, we are blocked by this problem?

isarkis avatar Jul 21 '22 17:07 isarkis

hi @isarkis, apologies for the delayed response but @rodnymolina has been out of office the last couple of weeks.

I took a brief look at this issue a couple of days ago but did not spot anything obvious. We will take a closer look next week. I suspect the problem is in the way Sysbox is applying the seccomp filters.

Thanks for giving Sysbox a shot in your infra.

ctalledo avatar Jul 27 '22 15:07 ctalledo

@ctalledo, any luck fixing this issue?

isarkis avatar Aug 19 '22 22:08 isarkis

Hi @isarkis, my apologies but I've been swamped with other Sysbox related work and have not had a chance to look into this yet.

Will do my best to get to it this week, thanks for your patience.

ctalledo avatar Aug 22 '22 05:08 ctalledo