ccl
ccl copied to clipboard
Broken executables on debian sid/ubuntu bionic
I get an exception while reloading the boot image upon trying to rebuild the core.
Bootstrapping binary: lx86cl from April 2019
Commit: current master
OS: Debian sid
uname -a: Linux phoeframe 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5 (2019-06-19) x86_64 GNU/Linux
Unhandled exception 11 at 0x16623fc0, context->regs at #xffd566a8
Exception occurred while executing foreign code
received signal 11; faulting address: 0x16623fc0
address not mapped to object
Main thread pid 30851
Current Thread Context Record (tcr) = 0xf7bca700
Control (C) stack area: low = 0xffbf3000, high = 0xffd56b34
Value (lisp) stack area: low = 0x679f3000, high = 0x67b3f000
Exception stack pointer = 0xffd56afc
%eax = 0x16623fc0
%ecx = 0xffd56b00
%edx = 0x00000004
%ebx = 0x5988ff00
%esp = 0xffd56afc
%ebp = 0xffd56b08
%esi = 0x0000000c
%edi = 0x6fbd40b6
%eip = 0x16623fc0
%eflags = 0x00010286
%cs = 0x0023
%ds = 0x002b
%ss = 0x002b
%es = 0x002b
%fs = 0x0007
%gs = 0x0063
Lisp memory areas:
code low high
dynamic (9) 0x6fca29c0 0x70cb0000
dynamic (9) 0x6fbc0000 0x6fca29c0
dynamic (9) 0x6fca29c0 0x6fca29c0
dynamic (9) 0x6fca29c0 0x6fca29c0
static (8) 0x12000 0x14000
managed static (7) 0x6bbc0000 0x6bbc0000
readonly (4) 0x67bc0000 0x6bbc0000
tstack (3) 0x67990000 0x679f2000
vstack (2) 0x679f3000 0x67b3f000
cstack (1) 0xffbf3000 0xffd56b34
Lisp kernel vc revision: v1.12-dev.4-35-ge8bd0293
Can't find symbol.
current thread: tcr = 0xf7bca700, native thread ID = 0x7883, interrupts disabled
(#x67B3EFD4) #x6FBD413D : #<Function %MAKE-RWLOCK-PTR #x6FBD40B6> + 135
(#x67B3EFE0) #x6FBD4BE5 : #<Function MAKE-READ-WRITE-LOCK #x6FBD4BCE> + 23
(#x67B3EFE8) #x6FC8144D : #<Anonymous Function #x6FC813AE> + 159
I cannot reproduce this on Travis - the build there happens normally.
I can reproduce the failure on another machine running Debian buster.
Attaching the faulting binaries. Do they work on anyone's machine, or were they somehow compiled incorrectly?
I have reproduced the build on Travis by using ubuntu bionic: https://travis-ci.com/phoe-trash/ccl/builds/134081589
It seems that compiling CCL32 on a new enough Linux fails.
Here's some gdb output that shows what's going on:
(gdb) disas
Dump of assembler code for function _SPffcall:
0x56567908 <+0>: mov %ebx,%eax
0x5656790a <+2>: sar $0x2,%eax
=> 0x5656790d <+5>: test $0x3,%bl
0x56567910 <+8>: je 0x56567915 <_SPffcall+13>
0x56567912 <+10>: mov -0x2(%ebx),%eax
0x56567915 <+13>: push %ebp
(gdb) p/x $ebx
$16 = 0x595dfe80
(gdb) p/x $eax
$17 = 0x16577fa0
(gdb) p rwlock_new
$19 = {rwlock *(void)} 0x56577fa0 <rwlock_new>
(gdb) p/x 0x56577fa0 << 2
$20 = 0x595dfe80
It's crashing during the first FFI call into the kernel (happens to be to rwlock_new). The address of this function is 0x56577fa0 which doesn't fit in a fixnum. %kernel-import takes the 32-bit pointer and multiplies it by 4 to convert to a fixnum. This overflows giving 0x595dfe80 in the lower 32-bits (EBX above). Then we shift right two places to get 0x16577fa0 in EAX which we then call and crash because it's a bogus address:
LocalLabelPrefix`'ffcall_call:
__(call *%eax)
C(ffcall_return):
The kernel built on old Ubuntu works because at that time programs weren't built with -pie by default. With a position independent executable the text section can be loaded anywhere in lower 3G. See #306 for a workaround that just disables PIE.
#306 fixed this