dynamorio icon indicating copy to clipboard operation
dynamorio copied to clipboard

AArch64 "Undefined HINT instruction found" running simple Java app

Open derekbruening opened this issue 3 years ago • 6 comments

I ran the ReadWrite.java from https://github.com/DynamoRIO/dynamorio/issues/5309#issue-1119299130 on our Jenkins machine and it has several of these warnings:

<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>
<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>
<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>
<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>

I didn't look up what this hint does: does DR need to take any action there or this is just a missing innocuous opcode?

$ disasm_a64 d503245f
llvm-mc:   d503245f hint #34
capstone:  d503245f hint #0x22
bfd:       d503245f bti c
DynamoRIO: d503245f xx $0xd503245f %sp %x2 %x9 %x3 -> %sp %x2 %x9 %x3

derekbruening avatar Feb 07 '22 16:02 derekbruening

I didn't look up what this hint does: does DR need to take any action there or this is just a missing innocuous opcode?

Looks like this HINT became a mandatory feature, FEAT_BTI, introduced in v8.5, https://developer.arm.com/documentation/ddi0596/2021-09/Base-Instructions/BTI--Branch-Target-Identification-

The 3 possible <target> types are encoded in the CRm:op2 field:

 Crm op2
0100:xx0
     00   undefined
     01   BTI C      target function call instrs BLR and BR using X16 and X17 for guarded pages
     10   BTI J      targets jumps using BR only
     11   BTI JC     both of above

Which implies: Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22) means BTI C

The AArch64 codec supports a small set of HINT ops, see https://github.com/DynamoRIO/dynamorio/blob/c99bcafc7e9057e0adb83fc42784a2fb1220e27e/core/ir/aarch64/codec.txt#L68 I'll raise a PR for BTI.

According to the spec CPUs with FEAT_BTI enabled will trap any instruction which tries to do an indirect jump to anything other than a BTI.

I don't know the details of DR's code cache and block/fragment linking, but could this be affecting the performance hits you're seeing? i.e. link optimisations just fail and fallback to context switches into DR's control loop.

AssadHashmi avatar Feb 07 '22 19:02 AssadHashmi

According to the spec CPUs with FEAT_BTI enabled will trap any instruction which tries to do an indirect jump to anything other than a BTI.

I don't know the details of DR's code cache and block/fragment linking, but could this be affecting the performance hits you're seeing? i.e. link optimisations just fail and fallback to context switches into DR's control loop.

Would this trap show up as a signal to user mode that we would see?

derekbruening avatar Feb 08 '22 05:02 derekbruening

Would this trap show up as a signal to user mode that we would see?

Yes. According to the AArch64 spec, h/w will raise a Branch Target exception when a guarded memory region is accessed. This comment in Linux source says that'll be a SIGILL in user-space: https://github.com/torvalds/linux/blob/555f3d7be91a873114c9656069f1a9fa476ec41a/arch/arm64/kernel/signal.c#L746

         * Signal delivery to a location in a PROT_BTI guarded page
         * that is not a function entry point will now trigger a
         * SIGILL in userspace.
         *
         * If the signal handler entry point is not in a PROT_BTI
         * guarded page, this is harmless.
         */

Addition of BTI support in Linux: https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/

Is it possible for the user to run on h/w which doesn't support FEAT_BTI or a build of the Java app which doesn't generate HINT 0x22? Running without BTI would be a quick way of checking if that is a/the cause.

The C++ binary doesn't display the same performance hit so could it be the JVM using BTI causing the problem?

AssadHashmi avatar Feb 08 '22 13:02 AssadHashmi

Some clarifications/questions:

  • These HINT warnings are what I observed running java on the Jenkins machine. I don't think that has ARMv8.5?
  • @kuhanov -- do you also see these HINT warnings on your machine?
  • If this is being hit and it is raising a SIGILL over and over, your theory is that the JVM is handling the SIGILL and continuing, rather than aborting, and the signal raise and handling is disrupting every hot path/loop?

derekbruening avatar Feb 08 '22 15:02 derekbruening

Some clarifications/questions:

  • These HINT warnings are what I observed running java on the Jenkins machine. I don't think that has ARMv8.5?

It doesn't support FEAT_BTI. So the HINT appears in the instruction stream and is ignored, i.e. treated as NOP.

  • If this is being hit and it is raising a SIGILL over and over, your theory is that the JVM is handling the SIGILL and continuing, rather than aborting, and the signal raise and handling is disrupting every hot path/loop?

Yes, that was my theory but as the machine it's running on doesn't support FEAT_BTI it's irrelevant now.

AssadHashmi avatar Feb 08 '22 15:02 AssadHashmi

  • @kuhanov -- do you also see these HINT warnings on your machine?

no, I didn't see such warnings on my machine

kuhanov avatar Feb 08 '22 16:02 kuhanov

These are also seen on Mac M1 machines running hello,world:

<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>
<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>

derekbruening avatar Jul 02 '23 15:07 derekbruening