radare2
radare2 copied to clipboard
Arm64 adrp instruction disassembly incorrect
Environment
Tue 12 Mar 2024 03:58:30 PM CET
radare2 5.8.8 1 @ linux-x86-32
birth: git.5.8.8 2023-06-08__14:01:02
commit: ea7f0356519884715cf1d5fba16042bac72b2df5
options: gpl -O1 cs:5 cl:0 make
Linux x86_64
Description
disassembly of adrp instructions does not make sense. take the following code, which initializes a struct:
static union U2 func_9(int64_t p_10, const int32_t * p_11, union U2 p_12, uint64_t p_13, int32_t * p_14)
{ /* block id: 196 */
union U2 l_673 = {0x5A8A51BE3E7C0A4FLL};
...
here's radare2's disassembly:
> aa
> sf sym.func_9
> pdf
...
│ 0x08000828 011d00f9 str x1, [x8, 0x38] ; arg2
│ 0x0800082c 031900f9 str x3, [x8, 0x30] ; arg4
│ 0x08000830 041500f9 str x4, [x8, 0x28] ; arg5
│ 0x08000834 08000090 adrp x8, loc.imp.strcmp ; loc.imp.printf ; 0x8000000
│ 0x08000838 08010091 add x8, x8, 0 ; 0x8000000 ; loc.imp.printf
│ 0x0800083c 0001c03d ldr q0, [x8] ; 0xe2 ; 226
│ 0x08000840 2001803d str q0, [x9]
addrp takes an offset, the resulting address is calculated relative to the current instruction. i assume that r2 interprets the offset as an absolute address, resulting in the disassembly showing "loc.imp.strcmp". Is this expected behavior? If not, can someone point me to where I can fix this? Also, is there a way to make radare2 give me the raw operand value instead of symbols?
ghidra's output makes more sense:
001007e8 01 1d 00 f9 str x1,[x8, #0x38]
001007ec 03 19 00 f9 str x3,[x8, #0x30]
001007f0 04 15 00 f9 str x4,[x8, #0x28]
001007f4 28 00 00 b0 adrp x8,0x105000
001007f8 08 41 2f 91 add x8,x8,#0xbd0
001007fc 00 01 c0 3d ldr q0,[x8]
00100800 20 01 80 3d str q0,[x9]
I don't know why the instruction bytes differ for that specific instruction, but I have verified that radare2's output corresponds to the actual file while ghidra's have changed, probably due to a preformed analysis.
Here are the files I used (source and binary), in case anyone is interested. random_39855.zip
Try r2 from git
Same behavior on current dev:
./radare2/binr/radare2/radare2 -v
radare2 5.8.9 31767 @ linux-x86-64
birth: git.5.8.8-1179-g49ff46d0a1 2024-03-12__16:47:26
commit: 49ff46d0a1717ea7ca19eaa674a2cbb9981ecdba
options: gpl -O? cs:5 cl:2 make
I was able to work around the issue by making radare2 output to json, which gives me the opcode without disassembly.
{
"offset":134219828,
"ptr":134217728,
"esil":"134217728,x8,=",
"refptr":0,
"fcn_addr":134219776,
"fcn_last":134220400,
"size":4,
"opcode":"adrp x8, 0x8000000",
"disasm":"adrp x8, loc.imp.strcmp",
"bytes":"08000090",
"family":"cpu",
"type":"lea",
"reloc":false,
"type_num":33,
"type2_num":0,
"refs":[
{
"addr":134217728, <- 0x80000000
"type":"DATA",
"perm":"r--"
}
]
},
the "refs" attribute is also very revealing here. I believe this is an issue with radare2.
yes this is "correct" and the expected behaviour, the address in 0x80000 is exactly the same address as loc.imp.strcmp. so its expected because the disassembler filters that string, the emulation shows the final computation of the value which happens in the add instruction after that adrp. and actually the 0x8000 is shown as a comment in the disassembly. SO i dont see the problem here. because the code can be doing adrp without any add and then you wont see which symbol is referencing.
And yes. the value shown by the disassembler is computed by providing the actual instruction position which shows that value. So let me clarify:
- you want an option to make adrp instructions not be filtered
- you want to see the absolute value instead of the relative
Either way, it can't be a reference to strcmp. adrp is used here to initialize structs with constant values.
The instructinos are from the following function, which doesn't reference strcmp at all:
static union U2 func_9(int64_t p_10, const int32_t * p_11, union U2 p_12, uint64_t p_13, int32_t * p_14)
{ /* block id: 196 */
union U2 l_673 = {0x5A8A51BE3E7C0A4FLL}; <- this is where the adrp happens.
for (p_12.f4.f0 = 0; (p_12.f4.f0 != 0); p_12.f4.f0 = safe_add_func_int8_t_s_s(p_12.f4.f0, 4))
{ /* block id: 199 */
uint32_t l_657 = 0xBF1DF1F3L;
struct S0 l_664 = {0xEF594491L,0x00BDF289L,1L,0x07L};
struct S0 *l_663 = &l_664;
struct S0 ** const l_662 = &l_663;
struct S0 ** const *l_661 = &l_662;
for (p_10 = 0; (p_10 <= (-27)); --p_10)
{ /* block id: 202 */
struct S0 **l_666 = &l_663;
struct S0 ***l_665[2][2][2] = {{{&l_666,&l_666},{&l_666,&l_666}},{{&l_666,&l_666},{&l_666,&l_666}}};
uint32_t l_670 = 0x85187951L;
uint32_t *l_669 = &l_670;
uint8_t l_671 = 0xF5L;
int32_t *l_672 = &l_664.f0;
int i, j, k;
(*l_672) = (l_657 == (~(((safe_mul_func_int8_t_s_s(((void*)0 == &p_12), (l_661 == (l_664.f1 , l_665[1][1][0])))) <= ((safe_rshift_func_int8_t_s_s((p_13 <= (((void*)0 != l_669) > l_671)), p_13)) > p_10)) || p_13)));
}
}
return l_673;
}
Here's how Ghidra disassembles the instruction, in case it's of any help:
Operand | x8 | 0x105000 |
---|---|---|
Labeled | x8 | 0x105000 |
Type | REG | SCAL |
Scalar | 0x105000 | |
Address | ||
Register | x8 | |
Op-Objects | x8 | const:0x105000 |
Operand Mask | 00011111 00000000 00000000 00000000 | 11100000 11111111 11111111 01100000 |
Masked Value | 00001000 00000000 00000000 00000000 | 00100000 00000000 00000000 00100000 |
Ghidra's "Basic Constant Reference Analyzer" correctly determines the constant values, and points me to the correct initial values in data.
for reference, this is the output from objdump:
7e8: f9001d01 str x1, [x8, #56]
7ec: f9001903 str x3, [x8, #48]
7f0: f9001504 str x4, [x8, #40]
7f4: 90000008 adrp x8, 0 <main>
7f8: 91000108 add x8, x8, #0x0
7fc: 3dc00100 ldr q0, [x8]
800: 3d800120 str q0, [x9]
804: f9400908 ldr x8, [x8, #16]
which is referencing "main".. what i see, is that those addresses are patched by relocs. and you can solve this if you compile an executable or a library instead of an object file. yes, r2, objdump, and many other tools have issues when processing object files, because relocs and elf objects are a big mess. the same output from objdump of the compiled executable is this:
fb8: f9002100 str x0, [x8, #64]
fbc: f9001d01 str x1, [x8, #56]
fc0: f9001903 str x3, [x8, #48]
fc4: f9001504 str x4, [x8, #40]
fc8: b0000028 adrp x8, 5000 <safe_mul_func_int16_t_s_s+0xc>
fcc: 913ee108 add x8, x8, #0xfb8
fd0: 3dc00100 ldr q0, [x8]
fd4: 3d800120 str q0, [x9]
fd8: f9400908 ldr x8, [x8, #16]
which in r2 looks like this:
[0x00000f94]> pd 20
;-- func_9:
0x00000f94 ff8304d1 sub sp, sp, 0x120
0x00000f98 fd7b10a9 stp x29, x30, [sp, 0x100]
0x00000f9c fc8b00f9 str x28, [sp, 0x110]
0x00000fa0 fd030491 add x29, sp, 0x100
0x00000fa4 a92301d1 sub x9, x29, 0x48
0x00000fa8 e91300f9 str x9, [sp, 0x20]
0x00000fac e90308aa mov x9, x8
0x00000fb0 e81340f9 ldr x8, [sp, 0x20]
0x00000fb4 e21700f9 str x2, [sp, 0x28]
0x00000fb8 002100f9 str x0, [x8, 0x40]
0x00000fbc 011d00f9 str x1, [x8, 0x38]
0x00000fc0 031900f9 str x3, [x8, 0x30]
0x00000fc4 041500f9 str x4, [x8, 0x28]
0x00000fc8 280000b0 adrp x8, 0x5000
0x00000fcc 08e13e91 add x8, x8, 0xfb8 ; str.O_n_
0x00000fd0 0001c03d ldr q0, [x8] ; 0xe2 ; str.O_n_
0x00000fd4 2001803d str q0, [x9]
0x00000fd8 080940f9 ldr x8, [x8, 0x10] ; 0xe2
0x00000fdc 280900f9 str x8, [x9, 0x10]
0x00000fe0 5f0000b9 str wzr, [x2]
[0x00000f94]> s str.O_n_
[0x00005fb8]> pxq 8
0x00005fb8 0x5a8a51be3e7c0a4f O.|>.Q.Z
[0x00005fb8]>
so the constant is there, and the root issue is how object files and relocs are handled. there's nothing wrong with adrp in here imho
looking further in this object file i've spotted a bunch of other issues:
- there are 5 different unsupported relocs used in this .o (pushed a PR to at least display some warnings) -
- applying the relocs causes a segfault because of a bug in io (pusehd another PR, @condret is reviewing to push a proper fix) - fixed by @condret https://github.com/radareorg/radare2/pull/22691
- objdump doesnt support those relocs either
will be good to extend the https://github.com/radareorg/radare2/pull/22690 PR to add support for the remaining arm64 relocs, would you like to give it a try? otherwise as a workaround i would suggest you to compile the executable or library and use it instead of the object file.
thanks for the sample files, it is ok for you if i push them into the testsuite?
found another issue in the elf imports, all of them are located in the same address because the plt section is not yet available. gonna fix that in another pr too :)
fixed here https://github.com/radareorg/radare2/pull/22692
thanks for the sample files, it is ok for you if i push them into the testsuite?
Sure. I will see what I can do about https://github.com/radareorg/radare2/pull/22690
another improvement here would be to ignore flags pointing to the base address of the loaded module. there's no point on showing a section name if that is not set. but still the right fix to do here would be to support the needed relocs properly. Let me know about adding this binary into the testsuite.
I will add your sample and add some tests now. Thanks! Let me need if you need some guiding about the relocs. The code is a little broken into different parts that patch, enumerate.. and they are actually stored 3 times. I may find some time after the release to fix that across the 90 bin plugins. But for your needs none of this should affect you and it’s only elf related
added tests here 8471613246a9073e59032e595965ca66397cecdc let me know if you need some help for implementing the relocs. feel free to join the discord or telegram channels for more interactions
ping?