dynamorio icon indicating copy to clipboard operation
dynamorio copied to clipboard

Add zero-masking vs merge-masking to IR and handle in scatter expansion

Open derekbruening opened this issue 3 years ago • 1 comments

Today the AVX-512 EVEX.z bit ({z} in assembler syntax when set to 1) controls whether zero elements in the mask are zeroed or not in the output. This is not represented in DR's IR, nor is zero-masking handled in scatter-gather expansion. This issue covers addressing both.

Generally, we want the IR to be an abstraction, mapping ISA encoding details into general concepts. The various prefixes in x86 map into operand size differences or opcode differences today. This zero-masking prefix bit doesn't affect just one operand: it affects the operation. The precedent there is to split the opcode. We have separate opcodes for OP_rep_movs vs OP_movs. This is similar to the "sub-opcode" numeric values indicating behavior in ARM (see the discussion at https://github.com/DynamoRIO/dynamorio/pull/4386#discussion_r462356954 regarding opcode philosophy of separate opcodes for separate semantics; see also #4388).

We do expose some x86 encoding prefixes in instr_t.prefixes today but they generally do not change the semantics enough that most tools, including taint-tracking tools, can completely ignore them. Part of me wants to get rid of instr_t.prefixes entirely (other than predication), so that tools don't have to worry about this separate set of flags controlling behavior, and split up opcodes where behavior differs. But if we did that for the zero-masking we may have a huge number of split opcodes and compatibility issues since the existing opcodes have been public for a while. So maybe embracing the prefixes is an ok solution for these behavior changes that apply to many different opcodes. I would name this something like PREFIX_MASK_ZERO though to try to be cross-platform if another ISA had the same concept, and not put anything about "EVEX" in there.

derekbruening avatar May 09 '22 18:05 derekbruening

I see that x86 prefixes are somewhat similar to OP_sys on AArch64. But there are two key differences that I think justify a different design (instead of splitting the opcodes):

  1. OP_sys had a small number of sub-opcodes. It was easy to define each of them as a separate opcode. Whereas the x86 prefixes affect all instructions that can have prefixes (e.g. for the z bit, we'd have to define a new variant for many relevant avx512 opcodes. To make it worse, multiple prefix bits can potentially be combined, creating even more combinations, and more opcodes.)
  2. Some OP_sys sub-opcodes were fundamentally different (e.g. DC ZVA had to be marked as a store), so separating its opcode made sense. However, the x86 prefixes don't change the opcode behavior that much, so creating separate opcodes would just add cruft.

I vote to expose the z bit in the prefix, and naming it something generic like you suggested.

Tools should really remember to look at the prefixes if they care. The x86 ISA also has prefixes, so the concept is not specific to DR.

abhinav92003 avatar May 10 '22 15:05 abhinav92003

nor is zero-masking handled in scatter-gather expansion.

Looking more closely for the AVX512 gather (https://www.felixcloutier.com/x86/vpgatherdd:vpgatherdq), looks like it doesn't support zeroing-masking, and performs merging-masking always. So the scatter/gather expansion doesn't need to worry about this.

abhinav92003 avatar Nov 09 '22 16:11 abhinav92003

In discussions about how to represent the similar zero vs merging in AArch64 SVE in PR #5718 we decided that putting a flag on the predicate/mask register operand is the way to go, rather than the whole-instruction prefix flag proposed here. The predicate/mask register is already controlling the operation by selecting the sources, so adding the zero vs merge semantics on just that operand seems reasonable.

derekbruening avatar Nov 09 '22 16:11 derekbruening

For AArch64 SVE we now have DR_OPND_IS_MERGE_PREDICATE and DR_OPND_IS_ZERO_PREDICATE so I assume we would use those on x86 too?

derekbruening avatar Aug 18 '23 15:08 derekbruening

Makes sense to use the same ones on x86. Any reason we shouldn't?

Interesting to note that x86 gathers are always merging but AArch64 gathers are zeroing.

abhinav92003 avatar Aug 22 '23 15:08 abhinav92003