ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

Apple M1 / AArch64 .data section not recognised as such

Open ryanmkurtz opened this issue 3 years ago • 4 comments

Discussed in https://github.com/NationalSecurityAgency/ghidra/discussions/3658

Originally posted by p-Wave November 20, 2021 Hi all,

I have the following "Hello World" code:

.global _start
.align 2

.text
_start: mov X0, 1
        adrp X1, helloworld@PAGE
        mov X2, 13
        mov X16, 4
        svc 0x80

        mov     X0, 0
        mov     X16, 1
        svc     0x80
.data
helloworld:      .ascii  "Hello World!\n"

which I compile with

Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: arm64-apple-darwin21.1.0

the CodeBrowser in Ghidra doesn't recognise the data section, but instead gives me the following interpretation (starting at 0x20) :

                             //
                             // __text 
                             // __TEXT
                             // ram:00000000-ram:0000001f
                             //
                             **************************************************************
                             *                                                            *
                             *  FUNCTION                                                  *
                             **************************************************************
                             undefined ltmp0()
             undefined         w0:1           <RETURN>
                             _start                                          XREF[1]:     Entry Point(*)  
                             ltmp0
        00000000 20 00 80 d2     mov        x0,#0x1
        00000004 01 00 00 90     adrp       x1,0x0
        00000008 a2 01 80 d2     mov        x2,#0xd
        0000000c 90 00 80 d2     mov        x16,#0x4
        00000010 01 10 00 d4     svc        0x80
        00000014 00 00 80 d2     mov        x0,#0x0
        00000018 30 00 80 d2     mov        x16,#0x1
        0000001c 01 10 00 d4     svc        0x80
                             //
                             // __data 
                             // __DATA
                             // ram:00000020-ram:0000002c
                             //
                             ltmp1
                             helloworld
        00000020 48 65 6c 6c     ldnp       d8,d25,[x10, #-0x140]
        00000024 6f 20 57 6f     umlal2     v15.4S,v3.8H,v7.H[0x1]
        00000028 72              ??         72h    r
        00000029 6c              ??         6Ch    l
        0000002a 64              ??         64h    d
        0000002b 21              ??         21h    !
        0000002c 0a              ??         0Ah

What am I missing/ doing wrong?

Thank you very much!

ryanmkurtz avatar Nov 22 '21 17:11 ryanmkurtz

Looks to be an issue with the way we are handling the SVC instruction. I assume it is not meant to return in your example right?

ryanmkurtz avatar Nov 22 '21 18:11 ryanmkurtz

svc is treated like a supervisor-level call instruction which in many uses may return and continue. This specific case is like calling a non-returning function. Unfortunately, Ghidra's AARCH64 svc semantics use a fall-through pcodeop CallSupervisor which prevents a flow-override from being applied. Ideally, the semantics for this instruction would be changed to give it a call flow which would allow a non-returning flow-override to be applied in this case.

This same potential issue also applies to the instructions hvc and smc.

ghidra1 avatar Dec 01 '21 15:12 ghidra1

Similar issue also applies to ARM where the svc and swi instructions use the software_interrupt fall-through pcodeop.

ghidra1 avatar Dec 01 '21 15:12 ghidra1

From your snippet, it looks like the .data section is being recognized by Ghidra. Ghidra will follow flow while disassembling, even if that takes it into a non-executable section. In this case, it's following flow that doesn't actually exist.

The issue is that the first instance of the svc instruction is a call to write and the second is a call to exit. exit is a non-returning function, so bytes after calls to it shouldn't be disassembled in general. Basically, knowing what the svc instruction does involves more than the bytes of the instruction - it also depends on the "environment" of the program. We're working on adding a system call analyzer which would automatically figure all of this out during analysis. This is still very much a work in progress.

At the moment, system calls can in some cases be handled manually or by a script - see ResolveX86orX64LinuxSystemCalls.java for an example. Unfortunately this will only work nicely if the pcode for the relevant instruction is in a certain form. In this case it's not, but I'll try to get a fix in for that. Once that's in place you could follow the script to handle the system calls.

There's some related discussion in #3936.

ghidracadabra avatar Sep 09 '22 18:09 ghidracadabra