gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

Looks like 32bit syscall test incorrectly detects Allowed platforms

Open pkit opened this issue 3 years ago • 14 comments

Description

//test/syscalls:32bit_test_native                                        FAILED in 0.3s
[ RUN      ] Syscall32Bit.Syscall
test/syscalls/linux/32bit.cc:176: Failure
Death test: ExitGroup32(kSyscall, kExitCode)
    Result: died but not with expected exit code:
            Terminated by signal 11 (core dumped)
Actual msg:

In system log I see:

traps: 32bit_test[3069363] trap invalid opcode ip:416e9000 sp:0 error:0
traps: 32bit_test[3069498] general protection fault ip:7ffdfad70549 sp:0 error:0

proc/cpuinfo

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 33
model name      : AMD Ryzen 9 5900X 12-Core Processor
stepping        : 0
microcode       : 0xa201016
cpu MHz         : 2200.000
cache size      : 512 KB
physical id     : 0
siblings        : 24
core id         : 0
cpu cores       : 12
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 7386.08
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

Steps to reproduce

Run make syscall-tests on AMD Ryzen 9 5900X

runsc version

https://github.com/google/gvisor/commit/2bb73c7bd7dcf0b36e774d8e82e464d04bc81f4b

docker version (if using docker)

Client: Docker Engine - Community
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:02:57 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:03 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.2
  GitCommit:        v1.1.2-0-ga916309
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

uname

5.15.0-43-generic #46~20.04.1-Ubuntu SMP Thu Jul 14 15:20:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

pkit avatar Aug 17 '22 09:08 pkit

The postfix native means that the test is running on the host linux system without gVisor.

avagin avatar Oct 11 '22 17:10 avagin

Yup, it runs on my host, and fails with invalid opcode, i.e. wrong architecture. So, it's expected? Why?

pkit avatar Oct 11 '22 18:10 pkit

It isn't expected. The syscall instruction has to be supported on amd hosts.

Here is how you can execute this test manually: $ GVISOR_PLATFORM_SUPPORT=32BIT:TRUE bazel-bin/test/syscalls/linux/32bit_test --gtest_filter=Syscall32Bit.Syscall

avagin avatar Oct 12 '22 19:10 avagin

$ GVISOR_PLATFORM_SUPPORT=32BIT:TRUE bazel-bin/test/syscalls/linux/32bit_test --gtest_filter=Syscall32Bit.Syscall
Note: Google Test filter = Syscall32Bit.Syscall
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Syscall32Bit
[ RUN      ] Syscall32Bit.Syscall
test/syscalls/linux/32bit.cc:176: Failure
Death test: ExitGroup32(kSyscall, kExitCode)
    Result: died but not with expected exit code:
            Terminated by signal 11 (core dumped)
Actual msg:
[  DEATH   ] 
[  FAILED  ] Syscall32Bit.Syscall (81 ms)
[----------] 1 test from Syscall32Bit (81 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (81 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Syscall32Bit.Syscall

 1 FAILED TEST
Failed to match any benchmarks against regex: .

pkit avatar Oct 14 '22 11:10 pkit

Single stepping on my machine it looks like it fails after iret $3 here https://github.com/google/gvisor/blob/2f57fc1f17a4ee84015def47970e9cee99cd31aa/test/syscalls/linux/32bit.cc#L72 Which makes me wonder if it's a kernel behavior?... Because as you've said the actual syscall instruction doesn't fail

pkit avatar Oct 14 '22 12:10 pkit

Nope, it does.

int main() { __asm__("syscall"); }
$ gcc -o tt -m32 -g tt.c
$ ./tt
Segmentation fault (core dumped)
$ gcc -o tt64 -g tt.c
$ ./tt64
$ 

pkit avatar Oct 14 '22 12:10 pkit

@avagin ^^

pkit avatar Oct 23 '22 15:10 pkit

A friendly reminder that this issue had no activity for 120 days.

github-actions[bot] avatar Sep 13 '23 00:09 github-actions[bot]

I have no idea what it can be. What syscall instruction is used when you run other x32 binaries?

avagin avatar Sep 14 '23 00:09 avagin

@avagin I think I've found it, see the PR and discussion here https://github.com/DynamoRIO/dynamorio/pull/5037/files It looks like my guess was correct: kernel does something strange here. And it was true: kernel unconditionally passes control to different code (hardcoded in a vsyscall page), so iret $3 is never reached. In the PR above they do some weird shit by placing a hook in vsyscall table and then jumping through it...

pkit avatar Sep 14 '23 01:09 pkit

@pkit I think you are right. The kernel doesn't expect that x86 processes call sysenter/syscall directly. They have to use vdso32 trampolines[1]. The problem here is that the kernel can rewrite a process instruction pointer: https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/common.c#L200

[1] https://elixir.bootlin.com/linux/v6.6-rc1/source/arch/x86/entry/vdso/vdso32/system_call.S#L36

avagin avatar Sep 14 '23 14:09 avagin

The problem here is that the kernel can rewrite a process instruction pointer

100% So, it essentially hardcodes jump back to int 80. Nice.

pkit avatar Sep 14 '23 14:09 pkit

A friendly reminder that this issue had no activity for 120 days.

github-actions[bot] avatar Jan 13 '24 00:01 github-actions[bot]

A friendly reminder that this issue had no activity for 120 days.

github-actions[bot] avatar May 13 '24 00:05 github-actions[bot]