subhook
subhook copied to clipboard
trampoline fails on X86_64 due to "endbr64" instruction not handled?
I am trying to create a trampoline to a function that in C is:
void foo(void) {
int a, b, c;
puts("foo() called");
a = random();
b = (random() + a) % (random() & 0xff);
c = a + b;
printf("foo: value of c = %d\n", c);
}
I am using gcc 9.3 on ubuntu 20.04 64 bit w/ Linux kernel 5.8.0-53-generic
Compiled code is:
(gdb) x/20iw foo
0x5555555552c3 <foo>: endbr64
0x5555555552c7 <foo+4>: push %rbp
0x5555555552c8 <foo+5>: mov %rsp,%rbp
0x5555555552cb <foo+8>: push %rbx
0x5555555552cc <foo+9>: sub $0x18,%rsp
0x5555555552d0 <foo+13>: lea 0xe09(%rip),%rdi # 0x5555555560e0
0x5555555552d7 <foo+20>: callq 0x5555555550d0 <puts@plt>
0x5555555552dc <foo+25>: callq 0x555555555110 <random@plt>
0x5555555552e1 <foo+30>: mov %eax,-0x1c(%rbp)
0x5555555552e4 <foo+33>: callq 0x555555555110 <random@plt>
0x5555555552e9 <foo+38>: mov -0x1c(%rbp),%edx
0x5555555552ec <foo+41>: movslq %edx,%rdx
0x5555555552ef <foo+44>: lea (%rax,%rdx,1),%rbx
0x5555555552f3 <foo+48>: callq 0x555555555110 <random@plt>
0x5555555552f8 <foo+53>: movzbl %al,%ecx
0x5555555552fb <foo+56>: mov %rbx,%rax
0x5555555552fe <foo+59>: cqto
0x555555555300 <foo+61>: idiv %rcx
0x555555555303 <foo+64>: mov %rdx,%rax
0x555555555306 <foo+67>: mov %eax,-0x18(%rbp)
And in bytes it is:
(gdb) x/20xw foo
0x5555555552c3 <foo>: 0xfa1e0ff3 0xe5894855 0xec834853 0x3d8d4818
0x5555555552d3 <foo+16>: 0x00000e09 0xfffdf4e8 0xfe2fe8ff 0x4589ffff
0x5555555552e3 <foo+32>: 0xfe27e8e4 0x558bffff 0xd26348e4 0x101c8d48
0x5555555552f3 <foo+48>: 0xfffe18e8 0xc8b60fff 0x48d88948 0xf9f74899
0x555555555303 <foo+64>: 0x89d08948 0x558be845 0xe8458be4 0x4589d001
subhook_disasm() fails to decode this function. I have tried to understand subhook_diasm() but I can't tell just what the issue (or the fix) might me. Any hints welcome, including if problem is something else.
On older 64 bit linux system w/ GCC 4.4.7 this function is:
gdb) x/20iw foo
0x400ac9 <foo>: push %rbp
0x400aca <foo+1>: mov %rsp,%rbp
0x400acd <foo+4>: push %rbx
0x400ace <foo+5>: sub $0x28,%rsp
0x400ad2 <foo+9>: mov $0x400df0,%edi
0x400ad7 <foo+14>: callq 0x4008e0 <puts@plt>
0x400adc <foo+19>: callq 0x400950 <random@plt>
0x400ae1 <foo+24>: mov %eax,-0x1c(%rbp)
0x400ae4 <foo+27>: callq 0x400950 <random@plt>
0x400ae9 <foo+32>: mov -0x1c(%rbp),%edx
0x400aec <foo+35>: movslq %edx,%rdx
0x400aef <foo+38>: lea (%rax,%rdx,1),%rbx
0x400af3 <foo+42>: callq 0x400950 <random@plt>
0x400af8 <foo+47>: and $0xff,%eax
0x400afd <foo+52>: mov %rax,-0x28(%rbp)
0x400b01 <foo+56>: mov %rbx,%rdx
0x400b04 <foo+59>: mov %rdx,%rax
0x400b07 <foo+62>: sar $0x3f,%rdx
0x400b0b <foo+66>: idivq -0x28(%rbp)
0x400b0f <foo+70>: mov %rdx,%rax
Here trampoline works fine (the difference that is important I believe is the "endbr64" at start when using GCC 9.3
As some added information, I found on the Internet some mention that compiling with GCC with "-mmanual-endbr" may help. So on my GCC 9.3 test I did this, and now the function foo becomes:
0x5555555552bb <foo>: push %rbp
0x5555555552bc <foo+1>: mov %rsp,%rbp
0x5555555552bf <foo+4>: push %rbx
0x5555555552c0 <foo+5>: sub $0x18,%rsp
0x5555555552c4 <foo+9>: lea 0xe15(%rip),%rdi # 0x5555555560e0
0x5555555552cb <foo+16>: callq 0x5555555550d0 <puts@plt>
0x5555555552d0 <foo+21>: callq 0x555555555110 <random@plt>
0x5555555552d5 <foo+26>: mov %eax,-0x1c(%rbp)
0x5555555552d8 <foo+29>: callq 0x555555555110 <random@plt>
0x5555555552dd <foo+34>: mov -0x1c(%rbp),%edx
0x5555555552e0 <foo+37>: movslq %edx,%rdx
0x5555555552e3 <foo+40>: lea (%rax,%rdx,1),%rbx
0x5555555552e7 <foo+44>: callq 0x555555555110 <random@plt>
0x5555555552ec <foo+49>: movzbl %al,%ecx
0x5555555552ef <foo+52>: mov %rbx,%rax
0x5555555552f2 <foo+55>: cqto
0x5555555552f4 <foo+57>: idiv %rcx
0x5555555552f7 <foo+60>: mov %rdx,%rax
0x5555555552fa <foo+63>: mov %eax,-0x18(%rbp)
0x5555555552fd <foo+66>: mov -0x1c(%rbp),%edx
0x555555555300 <foo+69>: mov -0x18(%rbp),%eax
...
so the endbr64 instruction is gone. But now I run into "offset too large issue", in gdb I have:
subhook_make_trampoline (trampoline=0x40000000, src=0x5555555552bb
and diff between 0x40000000 and 0x5555555552bb is too large and I fail at:
#ifdef SUBHOOK_X86_64
if (CHECK_INT32_OVERFLOW(offset)) {
/*
* Oops! It looks like the two locations are too far away from each
* other! This is not going to work...
*/
*trampoline_len = 0;
return -EOVERFLOW;
}
# endif
Is there no workaround this problem? Why can't the trampoline code overcome this limit by doing some jump via RIP register or something? (Sorry, I have limited knowledge of Intel 64 bit instruction extensions, (AMD's extensions I believe.))
Pull request #58 can help address this issue on Linux.