gravity
gravity copied to clipboard
Buster-based image failures on AWS
Problem Description
Illegal instruction
failures in hook's init containers based on debian:buster
images (verified with hook image docker-tall:buster
).
Environment
AWS m5a.xlarge
instance running AMD EPYC 7571
.
Kernel: Linux <hostname> 3.10.0-1062.el7.x86_64 #1 SMP <timestamp> x86_64 GNU/Linux
Symptoms
Pods fail with Init:Error
with the init containers failing with exit code 132
which corresponds to SIGILL
.
Manual attempt to start a container based on this image also fails.
The images extracted from the host (docker save
and docker load
) appear to work in other environments.
Here's a snippet from gdb session:
(gdb) x/10i $pc
=> 0x7ffff7f4a190: (bad)
0x7ffff7f4a191: and r10,0xfff
0x7ffff7f4a198: sub r10,0x1000
0x7ffff7f4a19f: mov rdx,rcx
0x7ffff7f4a1a2: data16 nop WORD PTR cs:[rax+rax*1+0x0]
0x7ffff7f4a1ad: nop DWORD PTR [rax]
0x7ffff7f4a1b0: add r10,0x10
0x7ffff7f4a1b4: jg 0x7ffff7f4a2b0
0x7ffff7f4a1ba: movdqa xmm0,XMMWORD PTR [rdi+rdx*1]
0x7ffff7f4a1bf: palignr xmm0,XMMWORD PTR [rdi+rdx*1-0x10],0x7
which looks like a jump into the middle of an instruction. Offsetting the disassembly by a couple of bytes reveals this bit:
(gdb) x/10i $pc-2
0x7ffff7f4a18e: lea edx,[rdi+0x7]
0x7ffff7f4a191: and r10,0xfff
0x7ffff7f4a198: sub r10,0x1000
0x7ffff7f4a19f: mov rdx,rcx
0x7ffff7f4a1a2: data16 nop WORD PTR cs:[rax+rax*1+0x0]
0x7ffff7f4a1ad: nop DWORD PTR [rax]
0x7ffff7f4a1b0: add r10,0x10
0x7ffff7f4a1b4: jg 0x7ffff7f4a2b0
0x7ffff7f4a1ba: movdqa xmm0,XMMWORD PTR [rdi+rdx*1]
0x7ffff7f4a1bf: palignr xmm0,XMMWORD PTR [rdi+rdx*1-0x10],0x7
which likely corresponds to the following bit from libc.
No clear culprit so far.