ghidra
ghidra copied to clipboard
Apple M1 / AArch64 .data section not recognised as such
Discussed in https://github.com/NationalSecurityAgency/ghidra/discussions/3658
Originally posted by p-Wave November 20, 2021 Hi all,
I have the following "Hello World" code:
.global _start
.align 2
.text
_start: mov X0, 1
adrp X1, helloworld@PAGE
mov X2, 13
mov X16, 4
svc 0x80
mov X0, 0
mov X16, 1
svc 0x80
.data
helloworld: .ascii "Hello World!\n"
which I compile with
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: arm64-apple-darwin21.1.0
the CodeBrowser in Ghidra doesn't recognise the data section, but instead gives me the following interpretation (starting at 0x20) :
//
// __text
// __TEXT
// ram:00000000-ram:0000001f
//
**************************************************************
* *
* FUNCTION *
**************************************************************
undefined ltmp0()
undefined w0:1 <RETURN>
_start XREF[1]: Entry Point(*)
ltmp0
00000000 20 00 80 d2 mov x0,#0x1
00000004 01 00 00 90 adrp x1,0x0
00000008 a2 01 80 d2 mov x2,#0xd
0000000c 90 00 80 d2 mov x16,#0x4
00000010 01 10 00 d4 svc 0x80
00000014 00 00 80 d2 mov x0,#0x0
00000018 30 00 80 d2 mov x16,#0x1
0000001c 01 10 00 d4 svc 0x80
//
// __data
// __DATA
// ram:00000020-ram:0000002c
//
ltmp1
helloworld
00000020 48 65 6c 6c ldnp d8,d25,[x10, #-0x140]
00000024 6f 20 57 6f umlal2 v15.4S,v3.8H,v7.H[0x1]
00000028 72 ?? 72h r
00000029 6c ?? 6Ch l
0000002a 64 ?? 64h d
0000002b 21 ?? 21h !
0000002c 0a ?? 0Ah
What am I missing/ doing wrong?
Thank you very much!
Looks to be an issue with the way we are handling the SVC instruction. I assume it is not meant to return in your example right?
svc is treated like a supervisor-level call instruction which in many uses may return and continue. This specific case is like calling a non-returning function. Unfortunately, Ghidra's AARCH64 svc semantics use a fall-through pcodeop CallSupervisor which prevents a flow-override from being applied. Ideally, the semantics for this instruction would be changed to give it a call flow which would allow a non-returning flow-override to be applied in this case.
This same potential issue also applies to the instructions hvc and smc.
Similar issue also applies to ARM where the svc and swi instructions use the software_interrupt fall-through pcodeop.
From your snippet, it looks like the .data section is being recognized by Ghidra. Ghidra will follow flow while disassembling, even if that takes it into a non-executable section. In this case, it's following flow that doesn't actually exist.
The issue is that the first instance of the svc instruction is a call to write and the second is a call to exit. exit is a non-returning function, so bytes after calls to it shouldn't be disassembled in general. Basically, knowing what the svc instruction does involves more than the bytes of the instruction - it also depends on the "environment" of the program. We're working on adding a system call analyzer which would automatically figure all of this out during analysis. This is still very much a work in progress.
At the moment, system calls can in some cases be handled manually or by a script - see ResolveX86orX64LinuxSystemCalls.java for an example. Unfortunately this will only work nicely if the pcode for the relevant instruction is in a certain form. In this case it's not, but I'll try to get a fix in for that. Once that's in place you could follow the script to handle the system calls.
There's some related discussion in #3936.