ARM: `drutil_insert_get_mem_addr` no longer works correctly on AArch64
https://github.com/DynamoRIO/dynamorio/pull/5283 introduced a change to decoding of load instructions, as a result of that change load index registers can be decoded as w registers (as opposed to x registers as previously).
Not all places in DR are prepared to encounter a w register in a load index context.
For example, this logic in drutil_insert_get_mem_addr_arm
reg_id_t index = opnd_get_index(memref);
// ...
reg_id_t stolen = dr_get_stolen_reg();
// ...
} else if (index == stolen) {
index = replace_stolen_reg(drcontext, bb, where, memref, dst, scratch,
scratch_used);
}
now doesn't recognize w index registers as stolen, because dr_get_stolen_reg() typically returns an x register.
This results in instrumentation being silently incorrect and drutil_insert_get_mem_addr_arm producing incorrect addresses.
Some other places besides drutil_insert_get_mem_addr_arm may also be affected by this change.
This may have resulted in incorrect addresses in AArch64 traces gathered since Jan 18.
Is one way to check whether any instruction matches ldr.*w28 or str.*w28?
This is rather disturbing. What can we do to avoid such regressions in the future?
-
Add more comprehensive regression tests for drutil_insert_get_mem_addr covering stolen register corner cases
-
Add checks for unmapped addresses to the invariant_checker trace analyzer: but this requires having metadata on the valid mapped regions, which are not gathered by the Github drmemtrace code today. This is no guarantee as the computed incorrect address could happen to land in a mapped region.
Is one way to check whether any instruction matches
ldr.*w28orstr.*w28?
The w28 must be an index, so it would be ldr.*, w28 or str.*, w28 for arm-style disassembly -- right? Planning to search through objdump output as an initial check.
This is rather disturbing. What can we do to avoid such regressions in the future?
It is! Adding more comprehensive regression tests for drutil_insert_get_mem_addr() covering stolen register corner cases is definitely something we should do.
In general, are there areas of DR core and API code which rely heavily on the correctness of generated instructions which should also have more comprehensive test coverage? Can we identify a set of INSTR_CREATE_ macros and DR functions which need better test coverage.
Can we identify these code paths and improve error trapping with e.g. ASSERTs, to fail loudly rather than silently go undetected?
Does #2443 cover enough of this?
So the #5283 fix itself was addressing incorrect addresses which I don't remember realizing when that fix went in: so older traces pre-Jan 18 also had a source of inaccuracies.
So the #5283 fix itself was addressing incorrect addresses which I don't remember realizing when that fix went in: so older traces pre-Jan 18 also had a source of inaccuracies.
Could this be quantified? If we want to search for load/store instructions whose addresses were computed incorrectly prior to the #5283 fix what would we search for? Is it that any scaled index register was treated as not scaled at all?
Yes, it's correlated with the extend- if the extend is UXTW or SXTW then it should be a W register , if it is SXTX or LSL/UXTX, it should be an X register
Yes, it's correlated with the extend- if the extend is UXTW or SXTW then it should be a W register , if it is SXTX or LSL/UXTX, it should be an X register
But the extend scale is also incorrect, right? In the diff for PR #5283 I see things like this: At dis-a64.txt:13145 it was:
str %x1 -> (%x2,%x3,uxtx #3)[8byte]
And now it's:
str %x1 -> (%x2,%x3,lsl #3)[8byte]
So a trace gathered prior to that PR that had a store with a scale would have incorrectly ignored the scale and computed the wrong address? (I think uxtx #3 ignores the #3, i.e., it only extends and doesn't scale/shift?)
Indeed, it is incorrect, but it as at least consistently incorrect. UXTX only appears where it should be LSL, and the correct registers can be derived from that if we want to find where the decode errors occurred pre #5283.
I'm not sure on the difference between uxtx #3 and lsl #3. I'll have to defer to @AssadHashmi on that
I'm not sure on the difference between uxtx #3 and lsl #3. I'll have to defer to @AssadHashmi on that
My understanding is that UXTX shouldn't appear in X reg offsets because zero extending a 64 bit index value to 64 bits is meaningless, a no-op extension. decode_opnd_memreg_size() in #5283 just converted UXTX to the preferred alias LSL and opnd_base_disp_scale_disassemble() fixed it in disassembler.
So a trace gathered prior to that PR that had a store with a scale would have incorrectly ignored the scale and computed the wrong address? (I think uxtx #3 ignores the #3, i.e., it only extends and doesn't scale/shift?)
I don't think that's the case. The values used to create memref with opnd_create_base_disp_aarch64() are the same including scaled and extend_type but after #5283, extend_type is disassembled as LSL rather than UXTX.
So a trace gathered prior to that PR that had a store with a scale would have incorrectly ignored the scale and computed the wrong address? (I think uxtx #3 ignores the #3, i.e., it only extends and doesn't scale/shift?)
I don't think that's the case. The values used to create memref with
opnd_create_base_disp_aarch64()are the same includingscaledandextend_typebut after #5283,extend_typeis disassembled asLSLrather thanUXTX.
OK, so a trace prior to PR #5283 would only have an incorrect address related to that PR's changes if it had an X register as an index that really should have been a W and it had a non-zero value in the upper bits? The scaling and extension changes from that PR are purely cosmetic.
The PR description of The "extension" LSL #0 was being decoded as uxtx #0, combined with the diff showing changes that weren't with just #0, made it seem like things were being decoded incorrectly, not just being disassembled incorrectly.
I did a sanity test:
After PR #5283:
derek@dynamorio:~/dr/build$ rm -rf drmemtrace.*.dir; setarch `uname -m` -R bin64/drrun -stderr_mask 0 -t drcachesim -offline -- ~/dr/test/pr5283 && bin64/drrun -stderr_mask 0 -t drcachesim -indir drmemtrace.*.dir -simulator_type view
All done
...
11: T1738749 0x0000000000400154 910003e2 add %sp $0x0000 lsl $0x00 -> %x2
12: T1738749 0x0000000000400158 d2800523 movz $0x0029 lsl $0x00 -> %x3
13: T1738749 0x000000000040015c f8237841 str %x1 -> (%x2,%x3,lsl #3)[8byte]
14: T1738749 write 8 byte(s) @ 0xfffffffff188
Going back before the PR:
derek@dynamorio:~/dr/build$ rm -rf drmemtrace.*.dir; setarch `uname -m` -R bin64/drrun -stderr_mask 0 -t drcachesim -offline -- ~/dr/test/pr5283 && bin64/drrun -stderr_mask 0 -t drcachesim -indir drmemtrace.*.dir -simulator_type view
All done
...
11: T1731912 0x0000000000400154 910003e2 add %sp $0x0000 lsl $0x00 -> %x2
12: T1731912 0x0000000000400158 d2800523 movz $0x0029 lsl $0x00 -> %x3
13: T1731912 0x000000000040015c f8237841 str %x1 -> (%x2,%x3,uxtx #3)[8byte]
14: T1731912 write 8 byte(s) @ 0xfffffffff188
Assuming my ASLR-disabling test will get the same stack address (sure seems like it; too lazy to create a different constant address instead of using SP in a tiny asm test; maybe should have used a .text address and a load) it does seem that the lsl #3 vs uxtx #3 disassembly difference made no impact on the traced address.
Through unrelated validation of virtual-to-physical address gathering, we discovered a source of incorrect addresses in our traces: instructions with w28 as the index register. It seems that the fix in #5511 did not actually result in correct addresses. (Incidentally, was that PR meant to close this issue? Was there other work that left it open?)
Creating a test case we can see that the addresses are wrong:
mov x28, sp
mov x27, sp
add x5, x27, w28, sxtw #3
ldr x4, [x27]
ldr x4, [x27, w28, sxtw #3]
I ran it without ASLR which just happens to not crash on that memref:
derek@dynamorio:~/dr/build$ pushd ~/dr/test; as -mcpu=cortex-a55 -o allasm_aarch64.o allasm_aarch64.s && gcc -static -o allasm_aarch64 allasm_aarch64.o -nostartfiles; popd
derek@dynamorio:~/dr/build$ rm -rf drmemtrace.*.dir; setarch `uname -m` -R bin64/drrun -t drcachesim -offline -- ~/dr/test/allasm_aarch64
derek@dynamorio:~/dr/build$ bin64/drrun -t drcachesim -indir drmemtrace.*.dir -simulator_type view 2>&1 | head -40
14: T820058 ifetch 4 byte(s) @ 0x0000000000400158 910003fc add %sp $0x0000 lsl $0x00 -> %x28
15: T820058 ifetch 4 byte(s) @ 0x000000000040015c 910003fb add %sp $0x0000 lsl $0x00 -> %x27
16: T820058 ifetch 4 byte(s) @ 0x0000000000400160 8b3ccf65 add %x27 %w28 sxtw $0x03 -> %x5
17: T820058 ifetch 4 byte(s) @ 0x0000000000400164 f9400364 ldr (%x27)[8byte] -> %x4
18: T820058 read 8 byte(s) @ 0x0000fffffffff040 by PC 0x0000000000400164
19: T820058 ifetch 4 byte(s) @ 0x0000000000400168 f87cdb64 ldr (%x27,%w28,sxtw #3)[8byte] -> %x4
20: T820058 read 8 byte(s) @ 0x0000ffffc0187040 by PC 0x0000000000400168
That recorded address 0x0000ffffc0187040 sure looks wrong. It should be:
(gdb) p/x 0x0000fffffffff040 + (((int) 0x0000fffffffff040)<<3)
$9 = 0xffffffff7240
Instru:
after instrumentation:
TAG 0x0000000000400158
+0 m4 @0x0000fffdf804abb0 f940d784 ldr +0x01a8(%x28)[8byte] -> %x4
+4 m4 @0x0000fffdf804aa68 d2802b05 movz $0x0158 lsl $0x00 -> %x5
+8 m4 @0x0000fffdf804a9e8 f2e40185 movk %x5 $0x200c lsl $0x30 -> %x5
+12 m4 @0x0000fffdf804a968 f9000085 str %x5 -> (%x4)[8byte]
+16 m4 @0x0000fffdf804ab30 f9000085 <label note=0x0000000000000000>
+16 m4 @0x0000fffdf804a8e8 91002084 add %x4 $0x0008 lsl $0x0000000000000000 -> %x4
+20 m4 @0x0000fffdf804a868 f900d784 str %x4 -> +0x01a8(%x28)[8byte]
+24 m4 @0x0000fffdf8049948 f900d784 <label note=0x0000000000000000>
+24 L3 @0x0000fffdf8049160 910003fc add %sp $0x0000 lsl $0x00 -> %x28
+28 L3 @0x0000fffdf804ac30 910003fb add %sp $0x0000 lsl $0x00 -> %x27
+32 L3 @0x0000fffdf8049690 8b3ccf65 add %x27 %w28 sxtw $0x03 -> %x5
+36 m4 @0x0000fffdf804a720 f940d784 ldr +0x01a8(%x28)[8byte] -> %x4
+40 m4 @0x0000fffdf804a5d8 f900009b str %x27 -> (%x4)[8byte]
+44 m4 @0x0000fffdf804a6a0 f900009b <label note=0x0000000000000000>
+44 m4 @0x0000fffdf804a510 91002084 add %x4 $0x0008 lsl $0x0000000000000000 -> %x4
+48 m4 @0x0000fffdf804a490 f900d784 str %x4 -> +0x01a8(%x28)[8byte]
+52 m4 @0x0000fffdf804a7a0 f900d784 <label note=0x0000000000000000>
+52 L3 @0x0000fffdf80497e0 f9400364 ldr (%x27)[8byte] -> %x4
+56 m4 @0x0000fffdf804a300 f940d784 ldr +0x01a8(%x28)[8byte] -> %x4
+60 m4 @0x0000fffdf804a200 f900ab80 str %x0 -> +0x0150(%x28)[8byte]
+64 m4 @0x0000fffdf804a138 f9401b80 ldr +0x30(%x28)[8byte] -> %x0
+68 m4 @0x0000fffdf804a070 8b3ccf60 add %x27 %w28 sxtw $0x0000000000000003 -> %x0
+72 m4 @0x0000fffdf8049ff0 f9000080 str %x0 -> (%x4)[8byte]
+76 m4 @0x0000fffdf804a280 f9000080 <label note=0x0000000000000000>
+76 m4 @0x0000fffdf8049f28 91002084 add %x4 $0x0008 lsl $0x0000000000000000 -> %x4
+80 m4 @0x0000fffdf8049ea8 f900d784 str %x4 -> +0x01a8(%x28)[8byte]
+84 m4 @0x0000fffdf804a3c8 f900d784 <label note=0x0000000000000000>
+84 L3 @0x0000fffdf8049778 f87cdb64 ldr (%x27,%w28,sxtw #3)[8byte] -> %x4
It's not restoring the app's x28 value, so the w28 it uses is DR's.
So all aarch64 traces have had incorrect addresses for w28 index registers, all this time.