ghidra
ghidra copied to clipboard
Auto analysis/detection of jumps into the middle of another instruction
I noticed this item in the Ghidra 10.4 change history: "Listing. Added ability to reduce an instructions length to facilitate overlapping instructions. This can now be accomplished by specifying an instruction length override on the first instruction and disassembling the bytes which follow it. The need for this has been observed with x86 where there may be a flow around a LOCK prefix byte. (GP-3256)"
Many years ago I disassembled some Motorola 68000 code which did something kind of related. Some versions of the SAS/C compiler for the Commodore Amiga can generate code sequences like this:
[test some condition]
bne.b label+2
moveq #123,D0
label cmpi.w #$7000,D0 ;Note: $7000 is opcode for moveq #0,D0
[do stuff depending on value in D0]
rts
The effect of that to put either 123 or 0 in D0. In that example, if 123 is put in D0, the cmpi.w #$7000,D0
instruction is effectively a no-op (following code doesn't depend on condition codes), so code execution falls through.
However if the branch to label+2 is taken, that does moveq #0,D0
before continuing. Effectively, it's a minor optimization of this:
[test some condition]
bne.b lab2
moveq #123,D0
bra.b cont
lab2 moveq #0,D0
cont [do stuff depending on value in D0]
rts
(cmpi.w
executes faster than bra.b
I guess.)
In Ghidra, one example disassembly initially shows:
0021f34c 4e ae ff ac jsr (-0x54,A6)
0021f350 4f ef 00 1c lea (0x1c,SP),SP
0021f354 4a 80 tst.l D0
0021f356 67 0a beq.b LAB_0021f360+2
0021f358 4a ac 19 dc tst.l (0x19dc,A4)=>DAT_00224920
0021f35c 66 04 bne.b LAB_0021f360+2
0021f35e 70 01 moveq #0x1,D0
LAB_0021f360+2 XREF[0,2]: 0021f356(j), 0021f35c(j)
0021f360 0c 40 70 00 cmpi.w #0x7000,D0w
0021f364 2e 00 move.l D0,D7
0021f366 67 18 beq.b LAB_0021f380
0021f368 48 6c 1a 1a pea (0x1a1a,A4)=>DAT_0022495e
...
After right-clicking cmpi.w #0x7000,D0w
, choosing "Modify Instruction Length..." and setting length to 2 bytes, the listing looks like this:
...
0021f356 67 0a beq.b LAB_0021f362
0021f358 4a ac 19 dc tst.l (0x19dc,A4)=>DAT_00224920
0021f35c 66 04 bne.b LAB_0021f362
0021f35e 70 01 moveq #0x1,D0
0021f360 0c 40 70 00 cmpi.w #0x7000,D0w
-- Length Override: 2 (actual length is 4)
-- Fallthrough: 0021f364
LAB_0021f362 XREF[2]: 0021f356(j), 0021f35c(j)
0021f362 70 ?? 70h p
0021f363 00 ?? 00h
LAB_0021f364 XREF[1]: 0021f360
0021f364 2e 00 move.l D0,D7
...
The bytes at LAB_0021f362 are not disassembled by default, even though LAB_00212362 is a jump destination.
This issue is to suggest a couple of improvements:
- have auto analysis detect and automatically mark instructions as "split"
- if the user manually does it, automatically disassemble the bytes (70 00 here) rather than leaving as data.
If anyone is interested in working on that I can upload some Amiga executables which contain those code sequences.
The first program I found this instruction sequence in was PackDev. Its source code is available for reference.
I think there's only one instance in that particular program, which is where the disassembly fragments above are from. I added ghidra_amiga_ldr to Ghidra to enable it to understand the Amiga hunk executable format. But you could probably just load an executable as a binary image too.
By the way, after Ghidra analyzed PackDev, I did notice an issue with it thinking FUN_00220ee2 does not return. Meaning that all calls to it are "broken" like this after auto-analysis:
00220534 41 ec 01 9a lea (0x19a,A4)=>s_Not_enough_memory_002230de,A0 = "Not enough memory\n"
00220538 61 00 09 a8 bsr.w FUN_00220ee2 undefined FUN_00220ee2()
-- Flow Override: CALL_RETURN (CALL_TERMINATOR)
0022053c 70 ?? 70h p
0022053d 14 ?? 14h
0022053e 60 ?? 60h `
0022053f 00 ?? 00h
00220540 09 ?? 09h
00220541 98 ?? 98h
LAB_00220542 XREF[1]: 00220532(j)
00220542 20 2c 1c 94 move.l (0x1c94,A4)=>DAT_00224bd8,D0
00220546 29 40 1b f8 move.l D0,(0x1bf8,A4)=>DAT_00224b3c
I can open a separate issue for that if needed.
- have auto analysis detect and automatically mark instructions as "split"
Do to the frequent occurance of bad offcut flows we are currently not considering any generalized attempt at performing such fixups automatically.
- if the user manually does it, automatically disassemble the bytes (70 00 here) rather than leaving as data.
At this point in time the action is intended for more surgical manipulation and we do not want to tie auto-disassembly to the action. We are considering the intoduction of a script (which could be attached to a key-binding) which will attempt all modifications for an instruction which contains an offcut flow to produce overlapping instructions.