can not detect self modify code range on aarch64 v9
Describe the bug see discussion in https://groups.google.com/g/dynamorio-users/c/72wz2q-njCg
When trace java application on arm v9 hardware, DynamoRIO does not know the jit code change. It will cause strange failure. The root cause is ic_ivau instruction is not used in new aarch64 platform for icache flushing.
To Reproduce Steps to reproduce the behavior:
- Use a debug build DynamoRIO to run with java command , set the log level to 4
export DYNAMO_HOME=<Debug build>
export JAVA_HOME=<java install>
$DYNAMO_HOME/bin64/drrun -root $DYNAMO_HOME -verbose -debug -loglevel 4 -c $DYNAMO_HOME/api/bin/libbbsize.so -- $JAVA_HOME/bin/jps
- check log output to find mangled code stub for isb and ic_ivau
grep ic_ivau $DYNAMO_HOME/logs/jps*/*
grep isb $DYNAMO_HOME/logs/jps*/*
- Exact output or incorrect behavior. For ic_ivau instruction, no output. For isb instruction, we can find log like:
jps.292482.00000001/log.1.292483.html: 0x0000ffffa2492554 d5033fdf isb $0x0f
jps.292482.00000001/log.1.292483.html: +0 L3 @0x0000fffd5e7c4098 d5033fdf isb $0x0f
jps.292482.00000001/log.1.292483.html: +0 L3 @0x0000fffd5e7c4098 d5033fdf isb $0x0f
jps.292482.00000001/log.1.292483.html:forward_eflags_analysis: isb $0x0f
jps.292482.00000001/log.1.292483.html: +0 L3 @0x0000fffd5e7c4098 d5033fdf isb $0x0f
jps.292482.00000001/log.1.292483.html: 0x0000ffff5e6a6868 d5033fdf isb $0x0f
jps.292482.00000001/log.1.292483.html: 0x0000ffffa0c86824 d5033fdf isb $0x0f
In log file, the isb mangle code is:
before instrumentation:
TAG 0x0000ffffa2492400
+0 L3 @0x0000fffd5e7c2fc8 d5033fdf isb $0x0f
END 0x0000ffffa2492400
after instrumentation:
TAG 0x0000ffffa2492400
+0 L3 @0x0000fffd5e7c2fc8 d5033fdf isb $0x0f
END 0x0000ffffa2492400
vm_list_overlaps 0x0000fffd5e4c24d0 vs 0x0000ffffa2492400-0x0000ffffa2492401
setting cur_pc (for fall-through) to 0x0000ffffa2492404
forward_eflags_analysis: isb $0x0f
instr 0 => 0
exit_branch_type=0x0 bb->exit_target=0x0000ffffa2492404
bb ilist after mangling:
TAG 0x0000ffffa2492400
+0 L3 @0x0000fffd5e7c2fc8 d5033fdf isb $0x0f
+4 m4 @0x0000fffd5e7c4098 f9000380 str %x0 -> (%x28)[8byte]
+8 m4 @0x0000fffd5e7c3110 d28f5000 movz $0x7a80 lsl $0x00 -> %x0
+12 m4 @0x0000fffd5e7c4460 f2ae29e0 movk %x0 $0x714f lsl $0x10 -> %x0
+16 m4 @0x0000fffd5e7c2c88 b9400000 ldr (%x0)[4byte] -> %w0
+20 m4 @0x0000fffd5e7c4198 34000000 cbz @0x0000fffd5e7c3a20[8byte] %w0
+24 m4 @0x0000fffd5e7c31d8 a9008b81 stp %x1 %x2 -> +0x08(%x28)[16byte]
+28 m4 @0x0000fffd5e7c2db0 d287d202 movz $0x3e90 lsl $0x00 -> %x2
+32 m4 @0x0000fffd5e7c4218 f2ae2702 movk %x2 $0x7138 lsl $0x10 -> %x2
+36 m4 @0x0000fffd5e7c3048 d2848081 movz $0x2404 lsl $0x00 -> %x1
+40 m4 @0x0000fffd5e7c4398 f2b44921 movk %x1 $0xa249 lsl $0x10 -> %x1
+44 m4 @0x0000fffd5e7c3780 f2dfffe1 movk %x1 $0xffff lsl $0x20 -> %x1
+48 m4 @0x0000fffd5e7c4298 aa1c03e0 orr %xzr %x28 lsl $0x0000000000000000 -> %x0
+52 m4 @0x0000fffd5e7c4318 d61f0040 br %x2
+56 m4 @0x0000fffd5e7c3a20 d61f0040 <label note=0x0000000000000000>
+56 m4 @0x0000fffd5e7c3958 f9400380 ldr (%x28)[8byte] -> %x0
+60 L4 @0x0000fffd5e7c4118 14000000 b $0x0000ffffa2492404
END 0x0000ffffa2492400
Please also answer these questions:
- What happens when you run without any client?
- What happens when you run with debug build ("-debug" flag to drrun/drconfig/drinject)?
The debug build without client produce the same result.
Expected behavior see log of ic_ivau DynamoRIO can get the right range of dynamic generated code.
Versions
- What version of DynamoRIO are you using? Debug build from latest DynamoRIO source, commit faa8e4b6774ee90dfac6c51cfadef2effb15abb4
- Does the latest build from https://github.com/DynamoRIO/dynamorio/releases solve the problem? No
- What operating system version are you running on? ("Windows 10" is not sufficient: give the release number.) Linux aarch64
> uname -a
Linux 5.10.134-12.2.al8.aarch64 #1 SMP Thu Oct 27 02:19:04 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
- Is your application 32-bit or 64-bit? 64-bit
Additional context Add any other context about the problem here.
Pasting from the users list email:
In hotspot jvm , it uses gcc __builtin___clear_cache() to flush icache. It will skip "ic ivau" for new architecture. The code is like https://github.com/gcc-mirror/gcc/blob/releases/gcc-10/libgcc/config/aarch64/sync-cache.c .
/* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the Point of
Unification is not required for instruction to data coherence. */
@AssadHashmi do you know which ARMv8/v9 version was this introduced in (it's in the N1 manual so presumably in at least 8.2), and do you have any idea on how common this is for implementations to enable this DIC bit?
@AssadHashmi do you know which ARMv8/v9 version was this introduced in (it's in the N1 manual so presumably in at least 8.2), and do you have any idea on how common this is for implementations to enable this DIC bit?
Hi @derekbruening, I can't see any version data on documentation for CTR_EL0.DIC. This usually means a feature has been available since v8.0. I will dig some more and get back with answers to both questions.
@kuaiwei could you provide the processor make and model on which you see the CTR_EL0.DIC bit set?
Hi @derekbruening , my test processor is Yitian 710 which is based on ARM Neoverse N2 model.
Hi @derekbruening , my test processor is Yitian 710 which is based on ARM Neoverse N2 model.
Thank you. The question remains whether any other implementations have icache consistency in hardware and thus set CTR_EL0.DIC. It seems it is unrelated to ARMv9 as that bit has been around since at least v8.2 and probably v8.0 but we have yet to see a v8.x processor that had it set.
Neoverse V1 (used in Graviton3) has CTR_EL0.DIC set.
Presumably the proper, fully transparent solution is to do on AArch64 whatever is already done on Intel (I've never looked at the details). However, it might also be worth adding an option to mangle MRS Xt, CTR_EL0 so that the app thinks that CTR_EL0.DIC is zero, if that gives a solution that is quicker to implement, or more reliable with some programs, or gives better performance with some programs.
You could mangle the MRS by inserting an AND (immediate) after it but presumably you could also replace the MRS with a MOV (immediate), or sequence thereof, seeing as the value of CTR_EL0 is not expected to change. Or you could insert the AND only when DIC is set. I don't see a great advantage in any of those three ways of doing it.