rizin ARM ELF wrong detection of the mode

Work environment

Questions	Answers
OS/arch/bits (mandatory)	-
File format of the file you reverse (mandatory)	ELF
Architecture/bits of the file (mandatory)	ARM
`rizin -v` full output, not truncated (mandatory)

Expected behavior

Detect instructions mode automatically

Actual behavior

[0x000081f0]> pdf @ 0x81f0
╭ fcn.000081f0(int32_t arg1, int32_t arg2);
│           ; arg int32_t arg1 @ r0
│           ; arg int32_t arg2 @ r1
│           0x000081f0      2100008a       smlabbhs r0, sl, r0, r0
│           0x000081f4      588358aa       stmpl r3, {r1, r3, r5, r7, fp, ip, lr} ; arg2
│           0x000081f8      4293d001       addsmi sp, r3, 1
│           0x000081fc      2000e003       andhs lr, r0, r3            ; arg1
│           0x00008200      31012904       mrslo r2, apsr
│           0x00008204      dbf52001       blle  0xffd50210
│           ;-- syscall.0.8208:
│           0x00008208      4f824684       svcmi 0x824684
│           0x0000820c      20003f40       andhs r3, r0, r0, asr 30
│           0x00008210      00a30081       adceq r0, r3, r1, lsl 1
│           0x00008214      181a0092       ldmdane sl, {r1, r4, r7}
│           0x00008218      58693001       stmdapl sb!, {r0, ip, sp} ^
│           0x0000821c      280450b9       stmdahs r4, {r0, r3, r4, r5, r7, ip, lr}
│           0x00008220      dbf74660       blle  0xffdd9ba8
│           0x00008224      4330d100       teqmi r0, 0, 2
│           0x00008228      26013401       strhs r3, [r1], -r1, lsl 8
╰           0x0000822c      2c04dbba       invalid
[0x000081f0]>

Steps to reproduce the behavior

$ rizin -A 2048-P2K-AHI_EP1.elf

2048-P2K-AHI_EP1.elf.gz

Mar 15 '24 11:03 XVilka

Just making sure I fully understand the task before I attempt this, but the instruction at 0x8200 is what you reference as mode changing?

I'm a bit rusty on ARM, but from what I can tell it moves a value into the lower-half of the APSR (which contains the CPSR) and sets bit 5 of the CPSR to enable thumb mode.

Jun 11 '24 02:06 TheN00bBuilder

@TheN00bBuilder no, in this particular case the whole function (as many others in that file) is Thumb:

[0x000081f0]> afB 16
[0x000081f0]> pdf
╭ fcn.000081f0();
│       ╎   ; var int32_t arg2 @ r1
│       ╎   ; var int32_t arg1 @ r0
│       ╎   0x000081f0      2100           movs  r1, 0
│      ╭──> 0x000081f2      008a           lsls  r2, r1, 2
│      ╎╎   0x000081f4      5883           ldr   r3, [r0, r2]          ; arg2
│      ╎╎   0x000081f6      58aa           ldr   r2, [r5, r2]
│      ╎╎   0x000081f8      4293           cmp   r3, r2
│     ╭───< 0x000081fa      d001           beq   0x8200
│     │╎╎   0x000081fc      2000           movs  r0, 0                 ; arg1
│    ╭────< 0x000081fe      e003           b     syscall.0.8208
│    │╰───> 0x00008200      3101           adds  r1, 1
│    │ ╎╎   0x00008202      2904           cmp   r1, 4                 ; 4
│    │ ╰──< 0x00008204      dbf5           blt   0x81f2
│    │  ╎   0x00008206      2001           movs  r0, 1
│    │  ╎   ;-- syscall.0.8208:
│    ╰────> 0x00008208      4f82           ldr   r7, [aav.aav.0x0000c1c8] ; [0x8414:4]=0xc1c8 aav.0x0000c1c8
│       ╎   0x0000820a      4684           mov   ip, r0
│       ╎   0x0000820c      2000           movs  r0, 0
│       ╎   0x0000820e      3f40           subs  r7, 0x40
│       ╎   0x00008210      00a3           lsls  r3, r4, 2
│      ╭──> 0x00008212      0081           lsls  r1, r0, 2
│      ╎╎   0x00008214      181a           adds  r2, r3, r0
│      ╎╎   0x00008216      0092           lsls  r2, r2, 2
│      ╎╎   0x00008218      5869           ldr   r1, [r5, r1]
│      ╎╎   0x0000821a      3001           adds  r0, 1
│      ╎╎   0x0000821c      2804           cmp   r0, 4                 ; 4
│      ╎╎   0x0000821e      50b9           str   r1, [r7, r2]
│      ╰──< 0x00008220      dbf7           blt   0x8212
│       ╎   0x00008222      4660           mov   r0, ip
│       ╎   0x00008224      4330           orrs  r0, r6
│      ╭──< 0x00008226      d100           bne   0x822a
│      │╎   0x00008228      2601           movs  r6, 1
│      ╰──> 0x0000822a      3401           adds  r4, 1
│       ╎   0x0000822c      2c04           cmp   r4, 4                 ; 4
╰       ╰─< 0x0000822e      dbba           blt   $b                    ; sym._b_0x81a6
[0x000081f0]>

Jun 11 '24 04:06 XVilka

Gotcha, makes sense - it sounds like we're more worried about finding the ARM/THUMB switches. If anyone wants to work this go for it, else I'll give it a shot later this week.

Jun 20 '24 01:06 TheN00bBuilder

So if anyone has any ideas (I asked in the Rizen dev Mattermost) or wants to take care of this issue, here's what I've got so far:

ARM detects mode with its PC. If it BX/BLXes or LD/LDMs PC with an odd number, it enters Thumb mode.
Inside the analysis_arm_cs.c file, anop32() would be a great place to add this check. It checks other ways on if the instruction is Thumb so the change will propogate.
However... as far as I can tell there's no "context" of registers stored in this analysis process. This makes it super difficult to track.

I'd appreciate if someone who's more familiar with Rizen would know what has been previously done to track register values during analysis like this, or if there's some class / member that I can access that may give hints to what's happening.

Jun 24 '24 01:06 TheN00bBuilder

ARM detects mode with its PC. If it BX/BLXes or LD/LDMs PC with an odd number, it enters Thumb mode.

Are you sure this assumption is correct? Is this defined in the ISA somewhere? Because 0x000081f0 is definitely not an odd number. So jumps there wouldn't trigger the Thumb mode in this logic.

Jun 24 '24 08:06 Rot127

ARM detects mode with its PC. If it BX/BLXes or LD/LDMs PC with an odd number, it enters Thumb mode.

Are you sure this assumption is correct? Is this defined in the ISA somewhere? Because 0x000081f0 is definitely not an odd number. So jumps there wouldn't trigger the Thumb mode in this logic.

According to page A47 in the ARM v7 reference manual, section A2.3.2, Thumb mode context switch happens by writing an address with the LSB set to 1. It doesn’t start executing that address however (I should have made that clear in my first post).

https://developer.arm.com/documentation/ddi0406/latest/

Also looking at the binary included on the question, look at the entry point where the first instructions load 0x81F1 into IP which is written to PC, but Rizen still disassembles as ARM mode.

Jun 24 '24 12:06 TheN00bBuilder

I think for the first detection method, it is enough if it only works on binaries with given entry points. Otherwise, it gets too complicated (for the beginning).

I think the way I would address it is roughly the following:

Add something like RZ_ANALYSIS_OP_TYPE_CTX_SWITCH to _RzAnalysisOpType in rz_anlaysis.h
In anop32 and anop64 you can check if the instruction is a jump (writes to PC) with something like cs_insn_group(handle, insn, ARM_GRP_JUMP). Or you can check if PC is in the regs_write list.
- Rizin uses Capstone for ARM/AArch64 disassembly. Check out the cs_insn and cs_arm struct in the capstone.h header.
You can check the operands if they have the LSB set. This check depends on the instruction, of course. You can check how ARM_INS_BL sets the jump target. If it is an indirect jump, we are out of luck, because the reg content is not known.
If the target address has the LSB set, add the flag (RZ_ANALYSIS_OP_TYPE_CTX_SWITCH) to op->type.

Now on loading the binary you can disassemble the instruction at the entry point and check if RZ_ANALYSIS_OP_TYPE_CTX_SWITCH is set.

Also limit yourself to ARM for now. I just mentioned AArch64 because it is probably affected as well.

Jun 24 '24 12:06 Rot127

I think for the first detection method, it is enough if it only works on binaries with given entry points. Otherwise, it gets too complicated (for the beginning).

I think the way I would address it is roughly the following:

Add something like RZ_ANALYSIS_OP_TYPE_CTX_SWITCH to _RzAnalysisOpType in rz_anlaysis.h

In anop32 and anop64 you can check if the instruction is a jump (writes to PC) with something like cs_insn_group(handle, insn, ARM_GRP_JUMP). Or you can check if PC is in the regs_write list.

Rizin uses Capstone for ARM/AArch64 disassembly. Check out the cs_insn and cs_arm struct in the capstone.h header.

You can check the operands if they have the LSB set. This check depends on the instruction, of course. You can check how ARM_INS_BL sets the jump target. If it is an indirect jump, we are out of luck, because the reg content is not known.

If the target address has the LSB set, add the flag (RZ_ANALYSIS_OP_TYPE_CTX_SWITCH) to op->type.

Now on loading the binary you can disassemble the instruction at the entry point and check if RZ_ANALYSIS_OP_TYPE_CTX_SWITCH is set.

Also limit yourself to ARM for now. I just mentioned AArch64 because it is probably affected as well.

Ah, okay! Thank you so much for the guidance, having feedback from someone who’s very familiar with this codebase is extremely valuable. I will get a branch up for this and start work tonight!

Jun 24 '24 13:06 TheN00bBuilder

Currently working this in my dev branch.

Jun 25 '24 02:06 TheN00bBuilder

I just mentioned AArch64 because it is probably affected as well.

From what I know AARch64 doesn't have thumb mode, unless you count in the mode for running 32bit code on 64bit CPUs.

Overall I am bit skeptical towards an approach which focus on annotating the jumps instead of the code itself. There will not always be direct jumps, there can be indirect jumps, there can be pointers in vtable, there can be symbols in the symbol table (especially for dynamic libraries). From what I understand in all of those case the LSB could be set indicating that target contains THUMB code. We already have rz_anlysis_hint_set_bits which can be used to mark certain target address as being thumb code.

So for me a potentially more succefull strategy could be:

at the start before any other analysis start with all potential entry points to executable this stuff depends on executable format, stuff like the real entrypoint of executable, init static constructor tables, symbol lists (again remind dynamic libraries exist), mark all those parts of code with rz_analysis_hint_set_bits(16).
during the recursive anaylsis of code when encountering any instruction that produces code xref, check the LSB and again mark the target with set_bits(16)

With regards to code xref handling it might be necesarry to afterwards clear the LSB in code xref, otherwise I have seen some instances of code xref pointing to the address with LSB set which is in the middle of instruction and thus producing garbage disassembly. Not sure if it happens with all types of code xrefs or only some.

I personally don't see too much value in adding adding flag for the instruction which performs the jump. Two cases where it might be useful:

if there are problems with order of operation. If the instruction analysis clears the LSB early so that rest of analysis doesn't have to do special handling, then there is no other way of knowing that it was actually pointing to address with LSB set. It all depends on when the LSB is cleared, and where the code adding anaylsis_hint_set_bits(16) is located.
not sure how frequently the analyzer checks the bits hint, if it does it before disassembling each instruction then there are no issues. But if it does it in places where something might have changed having a flag might help.

It's more of a naming thing but calling it RZ_ANALYSIS_OP_TYPE_CTX_SWITCH when target is thumb seems weird. That would mean that thumb->thumb jump is also ctx switch, but thumb->arm isn't. That feels like potential source of misunderstandings.

Looking at the existing code more, seems rizin already has op->hint.new_bits which already gets set in multiple places. In some cases with commented out rz_analysis_hint_set_bits (a, op->jump, a->bits == 32? 16 : 32);. So it might not even be necessary to introduce a new flag or mechanism. I guess that the real problem is one of following:

in some cases hint.new_bits doesn't get set correctly
some specific instructions don't set hint.new_bits at all
code processing hint.new_bits is broken or missing
there is an issue with analysis order and code gets analyzed before other code referencing it, thus it gets analyzed before analysis_hint_bits is set to correct value for that specific address.
it's one of the half dozen "entry like" points an elf executable can have where the LSB and hint_set_bits isn't correctly processed, no point touching instruction analysis code if this was the cause

After reading a bit more the commented out_hint_set_bits, might be due to other piece of code doing in single place

if (op.hint.new_bits) {
	rz_analysis_hint_set_bits(analysis, op.jump, op.hint.new_bits);
}

With that said, I have no idea if all code paths which set hint.new_bits, later reach the part with code that transfers it to analysis_hint.

One minor drawback to focusing on bits hint being associated with target address instead of jump source is that in theory a single piece of code could in theory have dual use. But I don't think it's too much of a problem as no normal compiler would produce such stuff. And even if you intentionally are trying to write such code it would be an impressive challenge to do it for any nontrivial piece of code without diverging the control flow into ARM and Thumb parts. The problem is somewhat similar to how in x86 you can have overlapping instructions, I don't think we are spending any serious effort towards supporting that either.

The main cases in which rizin should behave reasonably without manual hints:

regular binary ARM fully compiled in thumb mode
mixed arm binary, in theory every C file and thus functions contained in it could be compiled with different flags for using or not using thumb mode. More realistically some static libs in a binary might have been compiled in one mode and others in other mode.
thumb only raw binaries . This is quite common, as many cortex-m MCUs only support the thumb mode.

Cases 1) and 3) can partially be handled by setting global asm.bits hint before analysis, but even with that I have seen some cases xref pointing in the middle of instruction.

Aug 20 '24 20:08 karliss

The more I read the related code, less it makes sense. Either I am misunderstanding something, or it never worked and was never tested. It looks like it's doing something similar to stuff it should be doing but not quite.

op->jump = IMM(0) & UT32_MAX; Why mask with UT32_MAX (that by the way was introduced in commit "fixing" thumb stuff)? Wouldn't it make sense to mask with 0xfffffffe to fix the target address? It's not like it was changed from 0xffe to 0xfff so that LSB can be processed by later stages. Also I expected new_bits to be set based on target address LSB, but it's done based on either current bits, or in some cases the opposite of current bits.

Aug 20 '24 21:08 karliss

One detail I might have misunderstood with regards to how mixed mode executables interact with symbol table. Looking at the example XVIlka gave, seems like all the symbol entries are always even, but some of them are 2 bytes aligned. So at least in the ELF symbol table thumb functions aren't marked by LSB. Supposedly ELF files uses special "$t" and "$a" symbols for marking thumb and arm regions. No idea how mixed arm/thumb mode executables interact with dynamic linking.

Supposedly rizin already has code which sets hint_bits based on "$t" and "$a" flags. And there are indeed a bunch of 16/32 bit hints set after analysis (although not sure which ones are set by which source of information). But something isn't right. Some address which contain the $t symbol, have 32 bit hint. They should have been 16(thumb). Either code which reads "$t" and "$a" doesn't work, or something later overwrites the hints with wrong values.

Even if I manually set the analysis hint bits for some part of code to 16, it still shows the arm mode disassembly instead of thumb. Just a guess but this might be closer to the true cause of errors. If bit hints are ignored, then there is no surprise that further analysis based on wrong disassembly mode produce even more junk.

Aug 20 '24 22:08 karliss

Ok, sometimes changing hint bits work and start producing thumb disassembly but only after few instructions of incorrect disassembly. I just noticed one more unusual detail in the example binary. It's a big endian arm executable. Maybe actual problems are caused by arm endianness handling. At least that explains some of weirdness when I was comparing stuff against manually disassembled parts of code.

Or is XVIilka is just trolling us with some weird CTF executable that does funny stuff with endianes switching not just mixing thumb and arm mode.

Aug 20 '24 23:08 karliss

No, this binary doesn't change endianness, at least not from what I know. In fact, I noticed similar behavior on other binaries for Cortex-M cores.

Aug 21 '24 00:08 XVilka