ARM ELF wrong detection of the mode
Work environment
| Questions | Answers |
|---|---|
| OS/arch/bits (mandatory) | - |
| File format of the file you reverse (mandatory) | ELF |
| Architecture/bits of the file (mandatory) | ARM |
rizin -v full output, not truncated (mandatory) |
Expected behavior
Detect instructions mode automatically
Actual behavior
[0x000081f0]> pdf @ 0x81f0
╭ fcn.000081f0(int32_t arg1, int32_t arg2);
│ ; arg int32_t arg1 @ r0
│ ; arg int32_t arg2 @ r1
│ 0x000081f0 2100008a smlabbhs r0, sl, r0, r0
│ 0x000081f4 588358aa stmpl r3, {r1, r3, r5, r7, fp, ip, lr} ; arg2
│ 0x000081f8 4293d001 addsmi sp, r3, 1
│ 0x000081fc 2000e003 andhs lr, r0, r3 ; arg1
│ 0x00008200 31012904 mrslo r2, apsr
│ 0x00008204 dbf52001 blle 0xffd50210
│ ;-- syscall.0.8208:
│ 0x00008208 4f824684 svcmi 0x824684
│ 0x0000820c 20003f40 andhs r3, r0, r0, asr 30
│ 0x00008210 00a30081 adceq r0, r3, r1, lsl 1
│ 0x00008214 181a0092 ldmdane sl, {r1, r4, r7}
│ 0x00008218 58693001 stmdapl sb!, {r0, ip, sp} ^
│ 0x0000821c 280450b9 stmdahs r4, {r0, r3, r4, r5, r7, ip, lr}
│ 0x00008220 dbf74660 blle 0xffdd9ba8
│ 0x00008224 4330d100 teqmi r0, 0, 2
│ 0x00008228 26013401 strhs r3, [r1], -r1, lsl 8
╰ 0x0000822c 2c04dbba invalid
[0x000081f0]>
Steps to reproduce the behavior
$ rizin -A 2048-P2K-AHI_EP1.elf
Just making sure I fully understand the task before I attempt this, but the instruction at 0x8200 is what you reference as mode changing?
I'm a bit rusty on ARM, but from what I can tell it moves a value into the lower-half of the APSR (which contains the CPSR) and sets bit 5 of the CPSR to enable thumb mode.
@TheN00bBuilder no, in this particular case the whole function (as many others in that file) is Thumb:
[0x000081f0]> afB 16
[0x000081f0]> pdf
╭ fcn.000081f0();
│ ╎ ; var int32_t arg2 @ r1
│ ╎ ; var int32_t arg1 @ r0
│ ╎ 0x000081f0 2100 movs r1, 0
│ ╭──> 0x000081f2 008a lsls r2, r1, 2
│ ╎╎ 0x000081f4 5883 ldr r3, [r0, r2] ; arg2
│ ╎╎ 0x000081f6 58aa ldr r2, [r5, r2]
│ ╎╎ 0x000081f8 4293 cmp r3, r2
│ ╭───< 0x000081fa d001 beq 0x8200
│ │╎╎ 0x000081fc 2000 movs r0, 0 ; arg1
│ ╭────< 0x000081fe e003 b syscall.0.8208
│ │╰───> 0x00008200 3101 adds r1, 1
│ │ ╎╎ 0x00008202 2904 cmp r1, 4 ; 4
│ │ ╰──< 0x00008204 dbf5 blt 0x81f2
│ │ ╎ 0x00008206 2001 movs r0, 1
│ │ ╎ ;-- syscall.0.8208:
│ ╰────> 0x00008208 4f82 ldr r7, [aav.aav.0x0000c1c8] ; [0x8414:4]=0xc1c8 aav.0x0000c1c8
│ ╎ 0x0000820a 4684 mov ip, r0
│ ╎ 0x0000820c 2000 movs r0, 0
│ ╎ 0x0000820e 3f40 subs r7, 0x40
│ ╎ 0x00008210 00a3 lsls r3, r4, 2
│ ╭──> 0x00008212 0081 lsls r1, r0, 2
│ ╎╎ 0x00008214 181a adds r2, r3, r0
│ ╎╎ 0x00008216 0092 lsls r2, r2, 2
│ ╎╎ 0x00008218 5869 ldr r1, [r5, r1]
│ ╎╎ 0x0000821a 3001 adds r0, 1
│ ╎╎ 0x0000821c 2804 cmp r0, 4 ; 4
│ ╎╎ 0x0000821e 50b9 str r1, [r7, r2]
│ ╰──< 0x00008220 dbf7 blt 0x8212
│ ╎ 0x00008222 4660 mov r0, ip
│ ╎ 0x00008224 4330 orrs r0, r6
│ ╭──< 0x00008226 d100 bne 0x822a
│ │╎ 0x00008228 2601 movs r6, 1
│ ╰──> 0x0000822a 3401 adds r4, 1
│ ╎ 0x0000822c 2c04 cmp r4, 4 ; 4
╰ ╰─< 0x0000822e dbba blt $b ; sym._b_0x81a6
[0x000081f0]>
Gotcha, makes sense - it sounds like we're more worried about finding the ARM/THUMB switches. If anyone wants to work this go for it, else I'll give it a shot later this week.
So if anyone has any ideas (I asked in the Rizen dev Mattermost) or wants to take care of this issue, here's what I've got so far:
- ARM detects mode with its PC. If it BX/BLXes or LD/LDMs PC with an odd number, it enters Thumb mode.
- Inside the
analysis_arm_cs.cfile,anop32()would be a great place to add this check. It checks other ways on if the instruction is Thumb so the change will propogate. - However... as far as I can tell there's no "context" of registers stored in this analysis process. This makes it super difficult to track.
I'd appreciate if someone who's more familiar with Rizen would know what has been previously done to track register values during analysis like this, or if there's some class / member that I can access that may give hints to what's happening.
ARM detects mode with its PC. If it BX/BLXes or LD/LDMs PC with an odd number, it enters Thumb mode.
Are you sure this assumption is correct? Is this defined in the ISA somewhere? Because 0x000081f0 is definitely not an odd number. So jumps there wouldn't trigger the Thumb mode in this logic.
ARM detects mode with its PC. If it BX/BLXes or LD/LDMs PC with an odd number, it enters Thumb mode.
Are you sure this assumption is correct? Is this defined in the ISA somewhere? Because
0x000081f0is definitely not an odd number. So jumps there wouldn't trigger the Thumb mode in this logic.
According to page A47 in the ARM v7 reference manual, section A2.3.2, Thumb mode context switch happens by writing an address with the LSB set to 1. It doesn’t start executing that address however (I should have made that clear in my first post).
https://developer.arm.com/documentation/ddi0406/latest/
Also looking at the binary included on the question, look at the entry point where the first instructions load 0x81F1 into IP which is written to PC, but Rizen still disassembles as ARM mode.
I think for the first detection method, it is enough if it only works on binaries with given entry points. Otherwise, it gets too complicated (for the beginning).
I think the way I would address it is roughly the following:
- Add something like
RZ_ANALYSIS_OP_TYPE_CTX_SWITCHto_RzAnalysisOpTypeinrz_anlaysis.h - In
anop32andanop64you can check if the instruction is ajump(writes toPC) with something likecs_insn_group(handle, insn, ARM_GRP_JUMP). Or you can check ifPCis in theregs_writelist.- Rizin uses Capstone for ARM/AArch64 disassembly. Check out the
cs_insnandcs_armstruct in thecapstone.hheader.
- Rizin uses Capstone for ARM/AArch64 disassembly. Check out the
- You can check the operands if they have the LSB set. This check depends on the instruction, of course. You can check how
ARM_INS_BLsets the jump target. If it is an indirect jump, we are out of luck, because the reg content is not known. - If the target address has the LSB set, add the flag (
RZ_ANALYSIS_OP_TYPE_CTX_SWITCH) toop->type.
Now on loading the binary you can disassemble the instruction at the entry point and check if RZ_ANALYSIS_OP_TYPE_CTX_SWITCH is set.
Also limit yourself to ARM for now. I just mentioned AArch64 because it is probably affected as well.
I think for the first detection method, it is enough if it only works on binaries with given entry points. Otherwise, it gets too complicated (for the beginning).
I think the way I would address it is roughly the following:
Add something like
RZ_ANALYSIS_OP_TYPE_CTX_SWITCHto_RzAnalysisOpTypeinrz_anlaysis.hIn
anop32andanop64you can check if the instruction is ajump(writes toPC) with something likecs_insn_group(handle, insn, ARM_GRP_JUMP). Or you can check ifPCis in theregs_writelist.
- Rizin uses Capstone for ARM/AArch64 disassembly. Check out the
cs_insnandcs_armstruct in thecapstone.hheader.You can check the operands if they have the LSB set. This check depends on the instruction, of course. You can check how
ARM_INS_BLsets the jump target. If it is an indirect jump, we are out of luck, because the reg content is not known.If the target address has the LSB set, add the flag (
RZ_ANALYSIS_OP_TYPE_CTX_SWITCH) toop->type.Now on loading the binary you can disassemble the instruction at the entry point and check if
RZ_ANALYSIS_OP_TYPE_CTX_SWITCHis set.Also limit yourself to ARM for now. I just mentioned AArch64 because it is probably affected as well.
Ah, okay! Thank you so much for the guidance, having feedback from someone who’s very familiar with this codebase is extremely valuable. I will get a branch up for this and start work tonight!
Currently working this in my dev branch.
I just mentioned AArch64 because it is probably affected as well.
From what I know AARch64 doesn't have thumb mode, unless you count in the mode for running 32bit code on 64bit CPUs.
Overall I am bit skeptical towards an approach which focus on annotating the jumps instead of the code itself. There will not always be direct jumps, there can be indirect jumps, there can be pointers in vtable, there can be symbols in the symbol table (especially for dynamic libraries). From what I understand in all of those case the LSB could be set indicating that target contains THUMB code. We already have rz_anlysis_hint_set_bits which can be used to mark certain target address as being thumb code.
So for me a potentially more succefull strategy could be:
- at the start before any other analysis start with all potential entry points to executable this stuff depends on executable format, stuff like the real entrypoint of executable, init static constructor tables, symbol lists (again remind dynamic libraries exist), mark all those parts of code with rz_analysis_hint_set_bits(16).
- during the recursive anaylsis of code when encountering any instruction that produces code xref, check the LSB and again mark the target with set_bits(16)
With regards to code xref handling it might be necesarry to afterwards clear the LSB in code xref, otherwise I have seen some instances of code xref pointing to the address with LSB set which is in the middle of instruction and thus producing garbage disassembly. Not sure if it happens with all types of code xrefs or only some.
I personally don't see too much value in adding adding flag for the instruction which performs the jump. Two cases where it might be useful:
- if there are problems with order of operation. If the instruction analysis clears the LSB early so that rest of analysis doesn't have to do special handling, then there is no other way of knowing that it was actually pointing to address with LSB set. It all depends on when the LSB is cleared, and where the code adding anaylsis_hint_set_bits(16) is located.
- not sure how frequently the analyzer checks the bits hint, if it does it before disassembling each instruction then there are no issues. But if it does it in places where something might have changed having a flag might help.
It's more of a naming thing but calling it RZ_ANALYSIS_OP_TYPE_CTX_SWITCH when target is thumb seems weird. That would mean that thumb->thumb jump is also ctx switch, but thumb->arm isn't. That feels like potential source of misunderstandings.
Looking at the existing code more, seems rizin already has op->hint.new_bits which already gets set in multiple places. In some cases with commented out rz_analysis_hint_set_bits (a, op->jump, a->bits == 32? 16 : 32);. So it might not even be necessary to introduce a new flag or mechanism. I guess that the real problem is one of following:
- in some cases hint.new_bits doesn't get set correctly
- some specific instructions don't set hint.new_bits at all
- code processing hint.new_bits is broken or missing
- there is an issue with analysis order and code gets analyzed before other code referencing it, thus it gets analyzed before analysis_hint_bits is set to correct value for that specific address.
- it's one of the half dozen "entry like" points an elf executable can have where the LSB and hint_set_bits isn't correctly processed, no point touching instruction analysis code if this was the cause
After reading a bit more the commented out_hint_set_bits, might be due to other piece of code doing in single place
if (op.hint.new_bits) {
rz_analysis_hint_set_bits(analysis, op.jump, op.hint.new_bits);
}
With that said, I have no idea if all code paths which set hint.new_bits, later reach the part with code that transfers it to analysis_hint.
One minor drawback to focusing on bits hint being associated with target address instead of jump source is that in theory a single piece of code could in theory have dual use. But I don't think it's too much of a problem as no normal compiler would produce such stuff. And even if you intentionally are trying to write such code it would be an impressive challenge to do it for any nontrivial piece of code without diverging the control flow into ARM and Thumb parts. The problem is somewhat similar to how in x86 you can have overlapping instructions, I don't think we are spending any serious effort towards supporting that either.
The main cases in which rizin should behave reasonably without manual hints:
- regular binary ARM fully compiled in thumb mode
- mixed arm binary, in theory every C file and thus functions contained in it could be compiled with different flags for using or not using thumb mode. More realistically some static libs in a binary might have been compiled in one mode and others in other mode.
- thumb only raw binaries . This is quite common, as many cortex-m MCUs only support the thumb mode.
Cases 1) and 3) can partially be handled by setting global asm.bits hint before analysis, but even with that I have seen some cases xref pointing in the middle of instruction.
The more I read the related code, less it makes sense. Either I am misunderstanding something, or it never worked and was never tested. It looks like it's doing something similar to stuff it should be doing but not quite.
op->jump = IMM(0) & UT32_MAX;
Why mask with UT32_MAX (that by the way was introduced in commit "fixing" thumb stuff)? Wouldn't it make sense to mask with 0xfffffffe to fix the target address? It's not like it was changed from 0xffe to 0xfff so that LSB can be processed by later stages. Also I expected new_bits to be set based on target address LSB, but it's done based on either current bits, or in some cases the opposite of current bits.
One detail I might have misunderstood with regards to how mixed mode executables interact with symbol table. Looking at the example XVIlka gave, seems like all the symbol entries are always even, but some of them are 2 bytes aligned. So at least in the ELF symbol table thumb functions aren't marked by LSB. Supposedly ELF files uses special "$t" and "$a" symbols for marking thumb and arm regions. No idea how mixed arm/thumb mode executables interact with dynamic linking.
Supposedly rizin already has code which sets hint_bits based on "$t" and "$a" flags. And there are indeed a bunch of 16/32 bit hints set after analysis (although not sure which ones are set by which source of information). But something isn't right. Some address which contain the $t symbol, have 32 bit hint. They should have been 16(thumb). Either code which reads "$t" and "$a" doesn't work, or something later overwrites the hints with wrong values.
Even if I manually set the analysis hint bits for some part of code to 16, it still shows the arm mode disassembly instead of thumb. Just a guess but this might be closer to the true cause of errors. If bit hints are ignored, then there is no surprise that further analysis based on wrong disassembly mode produce even more junk.
Ok, sometimes changing hint bits work and start producing thumb disassembly but only after few instructions of incorrect disassembly. I just noticed one more unusual detail in the example binary. It's a big endian arm executable. Maybe actual problems are caused by arm endianness handling. At least that explains some of weirdness when I was comparing stuff against manually disassembled parts of code.
Or is XVIilka is just trolling us with some weird CTF executable that does funny stuff with endianes switching not just mixing thumb and arm mode.
No, this binary doesn't change endianness, at least not from what I know. In fact, I noticed similar behavior on other binaries for Cortex-M cores.