dynamorio icon indicating copy to clipboard operation
dynamorio copied to clipboard

can not detect self modify code range on aarch64 v9

Open kuaiwei opened this issue 3 years ago • 7 comments

Describe the bug see discussion in https://groups.google.com/g/dynamorio-users/c/72wz2q-njCg

When trace java application on arm v9 hardware, DynamoRIO does not know the jit code change. It will cause strange failure. The root cause is ic_ivau instruction is not used in new aarch64 platform for icache flushing.

To Reproduce Steps to reproduce the behavior:

  1. Use a debug build DynamoRIO to run with java command , set the log level to 4
export DYNAMO_HOME=<Debug build>
export JAVA_HOME=<java install>
$DYNAMO_HOME/bin64/drrun -root $DYNAMO_HOME -verbose -debug -loglevel 4 -c  $DYNAMO_HOME/api/bin/libbbsize.so -- $JAVA_HOME/bin/jps
  1. check log output to find mangled code stub for isb and ic_ivau
grep ic_ivau $DYNAMO_HOME/logs/jps*/*
grep isb $DYNAMO_HOME/logs/jps*/*
  1. Exact output or incorrect behavior. For ic_ivau instruction, no output. For isb instruction, we can find log like:
jps.292482.00000001/log.1.292483.html:  0x0000ffffa2492554  d5033fdf   isb    $0x0f
jps.292482.00000001/log.1.292483.html: +0    L3 @0x0000fffd5e7c4098  d5033fdf   isb    $0x0f
jps.292482.00000001/log.1.292483.html: +0    L3 @0x0000fffd5e7c4098  d5033fdf   isb    $0x0f
jps.292482.00000001/log.1.292483.html:forward_eflags_analysis: isb    $0x0f
jps.292482.00000001/log.1.292483.html: +0    L3 @0x0000fffd5e7c4098  d5033fdf   isb    $0x0f
jps.292482.00000001/log.1.292483.html:  0x0000ffff5e6a6868  d5033fdf   isb    $0x0f
jps.292482.00000001/log.1.292483.html:  0x0000ffffa0c86824  d5033fdf   isb    $0x0f

In log file, the isb mangle code is:

before instrumentation:
TAG  0x0000ffffa2492400
 +0    L3 @0x0000fffd5e7c2fc8  d5033fdf   isb    $0x0f
END 0x0000ffffa2492400


after instrumentation:
TAG  0x0000ffffa2492400
 +0    L3 @0x0000fffd5e7c2fc8  d5033fdf   isb    $0x0f
END 0x0000ffffa2492400

vm_list_overlaps 0x0000fffd5e4c24d0 vs 0x0000ffffa2492400-0x0000ffffa2492401
setting cur_pc (for fall-through) to 0x0000ffffa2492404
forward_eflags_analysis: isb    $0x0f
        instr 0 => 0
exit_branch_type=0x0 bb->exit_target=0x0000ffffa2492404
bb ilist after mangling:
TAG  0x0000ffffa2492400
 +0    L3 @0x0000fffd5e7c2fc8  d5033fdf   isb    $0x0f
 +4    m4 @0x0000fffd5e7c4098  f9000380   str    %x0 -> (%x28)[8byte]
 +8    m4 @0x0000fffd5e7c3110  d28f5000   movz   $0x7a80 lsl $0x00 -> %x0
 +12   m4 @0x0000fffd5e7c4460  f2ae29e0   movk   %x0 $0x714f lsl $0x10 -> %x0
 +16   m4 @0x0000fffd5e7c2c88  b9400000   ldr    (%x0)[4byte] -> %w0
 +20   m4 @0x0000fffd5e7c4198  34000000   cbz    @0x0000fffd5e7c3a20[8byte] %w0
 +24   m4 @0x0000fffd5e7c31d8  a9008b81   stp    %x1 %x2 -> +0x08(%x28)[16byte]
 +28   m4 @0x0000fffd5e7c2db0  d287d202   movz   $0x3e90 lsl $0x00 -> %x2
 +32   m4 @0x0000fffd5e7c4218  f2ae2702   movk   %x2 $0x7138 lsl $0x10 -> %x2
 +36   m4 @0x0000fffd5e7c3048  d2848081   movz   $0x2404 lsl $0x00 -> %x1
 +40   m4 @0x0000fffd5e7c4398  f2b44921   movk   %x1 $0xa249 lsl $0x10 -> %x1
 +44   m4 @0x0000fffd5e7c3780  f2dfffe1   movk   %x1 $0xffff lsl $0x20 -> %x1
 +48   m4 @0x0000fffd5e7c4298  aa1c03e0   orr    %xzr %x28 lsl $0x0000000000000000 -> %x0
 +52   m4 @0x0000fffd5e7c4318  d61f0040   br     %x2
 +56   m4 @0x0000fffd5e7c3a20  d61f0040   <label note=0x0000000000000000>
 +56   m4 @0x0000fffd5e7c3958  f9400380   ldr    (%x28)[8byte] -> %x0
 +60   L4 @0x0000fffd5e7c4118  14000000   b      $0x0000ffffa2492404
END 0x0000ffffa2492400                                       

Please also answer these questions:

  • What happens when you run without any client?
  • What happens when you run with debug build ("-debug" flag to drrun/drconfig/drinject)?

The debug build without client produce the same result.

Expected behavior see log of ic_ivau DynamoRIO can get the right range of dynamic generated code.

Versions

  • What version of DynamoRIO are you using? Debug build from latest DynamoRIO source, commit faa8e4b6774ee90dfac6c51cfadef2effb15abb4
  • Does the latest build from https://github.com/DynamoRIO/dynamorio/releases solve the problem? No
  • What operating system version are you running on? ("Windows 10" is not sufficient: give the release number.) Linux aarch64
> uname -a
Linux 5.10.134-12.2.al8.aarch64 #1 SMP Thu Oct 27 02:19:04 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

  • Is your application 32-bit or 64-bit? 64-bit

Additional context Add any other context about the problem here.

kuaiwei avatar Dec 02 '22 12:12 kuaiwei

Pasting from the users list email:

In hotspot jvm , it uses gcc __builtin___clear_cache() to flush icache. It will skip "ic ivau" for new architecture. The code is like https://github.com/gcc-mirror/gcc/blob/releases/gcc-10/libgcc/config/aarch64/sync-cache.c .

  /* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the Point of
  Unification is not required for instruction to data coherence. */

@AssadHashmi do you know which ARMv8/v9 version was this introduced in (it's in the N1 manual so presumably in at least 8.2), and do you have any idea on how common this is for implementations to enable this DIC bit?

derekbruening avatar Dec 02 '22 16:12 derekbruening

@AssadHashmi do you know which ARMv8/v9 version was this introduced in (it's in the N1 manual so presumably in at least 8.2), and do you have any idea on how common this is for implementations to enable this DIC bit?

Hi @derekbruening, I can't see any version data on documentation for CTR_EL0.DIC. This usually means a feature has been available since v8.0. I will dig some more and get back with answers to both questions.

AssadHashmi avatar Dec 05 '22 15:12 AssadHashmi

@kuaiwei could you provide the processor make and model on which you see the CTR_EL0.DIC bit set?

derekbruening avatar Dec 16 '22 20:12 derekbruening

Hi @derekbruening , my test processor is Yitian 710 which is based on ARM Neoverse N2 model.

kuaiwei avatar Dec 19 '22 06:12 kuaiwei

Hi @derekbruening , my test processor is Yitian 710 which is based on ARM Neoverse N2 model.

Thank you. The question remains whether any other implementations have icache consistency in hardware and thus set CTR_EL0.DIC. It seems it is unrelated to ARMv9 as that bit has been around since at least v8.2 and probably v8.0 but we have yet to see a v8.x processor that had it set.

derekbruening avatar Dec 19 '22 20:12 derekbruening

Neoverse V1 (used in Graviton3) has CTR_EL0.DIC set.

egrimley-arm avatar Jun 23 '25 19:06 egrimley-arm

Presumably the proper, fully transparent solution is to do on AArch64 whatever is already done on Intel (I've never looked at the details). However, it might also be worth adding an option to mangle MRS Xt, CTR_EL0 so that the app thinks that CTR_EL0.DIC is zero, if that gives a solution that is quicker to implement, or more reliable with some programs, or gives better performance with some programs.

You could mangle the MRS by inserting an AND (immediate) after it but presumably you could also replace the MRS with a MOV (immediate), or sequence thereof, seeing as the value of CTR_EL0 is not expected to change. Or you could insert the AND only when DIC is set. I don't see a great advantage in any of those three ways of doing it.

egrimley-arm avatar Jun 24 '25 08:06 egrimley-arm