dynamorio
dynamorio copied to clipboard
i#2626: add Armv8.2 SHA-512 instructions to the decoder
I mostly did this by mimicking the other SHA instructions, so I'm not sure if I did this right. Test cases constructed by running them through as and then objdump -D.
(Context is that we were integrating some assembly for this instruction into BoringSSL, and I noticed neither valgrind nor drmemory supported the instruction. Looks like drmemory doesn't work yet on aarch64 anyway and I'm not sure what the consequences of OP_xx are in the first place. But I figure transcribing the instruction encoding was easy enough to do.)
References: https://developer.arm.com/documentation/ddi0602/2021-12/SIMD-FP-Instructions/SHA512H--SHA512-Hash-update-part-1- https://developer.arm.com/documentation/ddi0602/2021-12/SIMD-FP-Instructions/SHA512H2--SHA512-Hash-update-part-2- https://developer.arm.com/documentation/ddi0602/2021-12/SIMD-FP-Instructions/SHA512SU0--SHA512-Schedule-Update-0- https://developer.arm.com/documentation/ddi0602/2021-12/SIMD-FP-Instructions/SHA512SU1--SHA512-Schedule-Update-1-
Issue: #2626
Thank you for the contribution!
@AssadHashmi hoping you can take a look, or triage further.
run arm tests
Please add the comment string # v8.2 after each new defintion in codec.txt before merging.
Done.
There are upcoming changes to prepare the codec for new versions of the AArch64 ISA and such strings will help when splitting up defintions by version.
Oh! Is this going to figure into something like drcpusim? Would it be possible to teach the decode which instructions are gated on FEAT_WHATEVER? I see there are functions like instr_is_sse3 for x86. On x86, we (BoringSSL) use Intel SDE to verify cpuid checks (i.e. simulate non-existence of an instruction) and also emulate instructions not on our CI's host machines. If we could do something similar for Arm, that'd be really useful.
There are upcoming changes to prepare the codec for new versions of the AArch64 ISA and such strings will help when splitting up defintions by version.
Oh! Is this going to figure into something like drcpusim? Would it be possible to teach the decode which instructions are gated on FEAT_WHATEVER? I see there are functions like
instr_is_sse3for x86. On x86, we (BoringSSL) use Intel SDE to verify cpuid checks (i.e. simulate non-existence of an instruction)
Yes, eventually. The first step is to split up codec.txt and get the codec generator to create cleanly separated decode/encode files based on v8.x (codec_v80.txt, codec_v81.txt, codec_v82.txt), v9.x etc. Later patches will read the ID_AA64ISAR0_EL1 register at startup establishing which FEAT_s are supported on the h/w.
and also emulate instructions not on our CI's host machines. If we could do something similar for Arm, that'd be really useful.
That's not currently planned in.
On x86, we (BoringSSL) use Intel SDE to verify cpuid checks (i.e. simulate non-existence of an instruction) and also emulate instructions not on our CI's host machines. If we could do something similar for Arm, that'd be really useful.
Sorry, I think you meant that BoringSSL emulates instructions not on your CI's host machines didn't you? After DynamoRIO tells you which instructions are not supported on the h/w.
Sorry, I think you meant that BoringSSL emulates instructions not on your CI's host machines didn't you? After DynamoRIO tells you which instructions are not supported on the h/w.
No, we don't emulate instructions in our tooling. The situation is we have a lot of code that uses such-and-such new instruction, and gates it on a CPU capability check. We would like to test two things:
- We correctly gated the right functions on the right CPU capability checks. If we test on, say, a machine with AVX, we won't notice if we forgot to gate the AVX code on the AVX bit in CPUID.
- The code using the new instructions is correct. If we test on, say, a machine without AVX, we won't notice if the AVX code has a bug.
Intel SDE solves both those problems for us. We can tell it what CPU to mimic and it'll both simulate a lack of instructions for (1) and it'll emulate instructions newer than the host hardware for (2). We require both capabilities to adequately test.
As I understand it, drcpusim, if it worked on Arm, would cover (1) but not (2). That would already be a big improvement, although I'm hoping DynamoRIO could be used to also address (2), perhaps in drcpusim itself. (Emulating NEON is probably too large of a target, but things like FEAT_SHA512 should be doable.)
No, we don't emulate instructions in our tooling. The situation is we have a lot of code that uses such-and-such new instruction, and gates it on a CPU capability check. We would like to test two things:
- We correctly gated the right functions on the right CPU capability checks. If we test on, say, a machine with AVX, we won't notice if we forgot to gate the AVX code on the AVX bit in CPUID.
- The code using the new instructions is correct. If we test on, say, a machine without AVX, we won't notice if the AVX code has a bug.
Intel SDE solves both those problems for us. We can tell it what CPU to mimic and it'll both simulate a lack of instructions for (1) and it'll emulate instructions newer than the host hardware for (2). We require both capabilities to adequately test.
As I understand it, drcpusim, if it worked on Arm, would cover (1) but not (2). That would already be a big improvement, although I'm hoping DynamoRIO could be used to also address (2), perhaps in drcpusim itself. (Emulating NEON is probably too large of a target, but things like FEAT_SHA512 should be doable.)
I filed #5311 on adding AArch64 support to the existing drcpusim, and #5312 on your suggestion of extending drcpusim to emulate missing features.