keystone icon indicating copy to clipboard operation
keystone copied to clipboard

ARM Thumb BL instruction always jumps to itself

Open ForceBru opened this issue 4 years ago • 0 comments

This assembly:

start:
    add r0, r0, #1
    add r1, r1, #2
    bl start
    b start

When assembled by Keystone 0.9.1 (built from the latest commit 23b54ce7493575d13ac88982f30ab523c3d5a3b1) for architecture KS_ARCH_ARM and mode KS_MODE_THUMB and disassembled by Capstone 5.0.0, produces this:

0000|	00f10100	add.w	r0, r0, #1
0004|	01f10201	add.w	r1, r1, #2
0008|	fff7feff	bl   	#8
000c|	f8e7    	b    	#0

As you can see, bl start was encoded as bl #8, which is located at address 8, so it'll jump to itself, not start (address 0x00).

Looks like this happens because Clang generates invalid offsets on purpose and lets the linker fill them. As shown in my gist, clang asm.s generates correct offsets, but clang -c asm.s always generates the "jump to itself" code (f7ff fffe). This doesn't happen in some versions of Clang and Clang natively targeting armv7-apple-darwin. Also see my question on Stack Overflow about this issue.

This seems to happen only with code that jumps backwards, not forwards, for some reason.


Here's the full code to reproduce the bug:

import keystone as ks
import capstone as cs
import unicorn as uc

print(f'Keystone {ks.__version__}\nCapstone {cs.__version__}\nUnicorn {uc.__version__}\n')


code = '''
start:
    add r0, r0, #1
    add r1, r1, #2
    bl start
    b start
'''

assembler = ks.Ks(ks.KS_ARCH_ARM, ks.KS_MODE_THUMB)
disassembler = cs.Cs(cs.CS_ARCH_ARM, cs.CS_MODE_THUMB)
emulator = uc.Uc(uc.UC_ARCH_ARM, uc.UC_MODE_THUMB)

machine_code, _ = assembler.asm(code)
machine_code = bytes(machine_code)
print(machine_code.hex())

initial_address = 0
for addr, size, mnem, op_str in disassembler.disasm_lite(machine_code, initial_address):
    instruction = machine_code[addr:addr + size]
    print(f'{addr:04x}|\t{instruction.hex():<8}\t{mnem:<5}\t{op_str}')

emulator.mem_map(initial_address, 1024)
emulator.mem_write(initial_address, machine_code)
emulator.hook_add(uc.UC_HOOK_CODE, lambda uc, addr, size, _: print(f'Address: {addr}'))
emulator.emu_start(initial_address | 1, initial_address + len(machine_code), timeout=500)

Output of the above code:

python3 "run_ARM_bug.py"
Keystone 0.9.1
Capstone 5.0.0
Unicorn 1.0.2

00f1010001f10201fff7fefff8e7
0000|	00f10100	add.w	r0, r0, #1
0004|	01f10201	add.w	r1, r1, #2
0008|	fff7feff	bl   	#8
000c|	f8e7    	b    	#0
Address: 0
Address: 4
Address: 8
Address: 8
Address: 8
< repeats over and over again >
Address: 8
Address: 8
Address: 8
Traceback (most recent call last):
  File "run_ARM_bug.py", line 32, in <module>
    emulator.emu_start(initial_address | 1, initial_address + len(machine_code), timeout=500)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/unicorn-1.0.2rc3-py3.7.egg/unicorn/unicorn.py", line 317, in emu_start
unicorn.unicorn.UcError: Emulation timed out (UC_ERR_TIMEOUT)

ForceBru avatar Jun 04 '20 17:06 ForceBru