xed icon indicating copy to clipboard operation
xed copied to clipboard

Confusng ATT opcode's suffix encoding

Open sdasgup3 opened this issue 5 years ago • 4 comments

Hello Team, I am confused with the ATT encoding suffix as shown in the following example

./xed -64 -A -d F3410F7E00
F3410F7E00
ICLASS: MOVQ   CATEGORY: DATAXFER   EXTENSION: SSE2  IFORM: MOVQ_XMMdq_MEMq_0F7E   ISA_SET: SSE2
SHORT: movqq  (%r8), %xmm0

Is the opcode encoding movqq correct? It is not accepted by as or even xed. To make sure, I tried assembling movq (%r8), %xmm0 using 'as' and run ./xed -A -64 -i <assembled file> and get the same opcode movqq.

Please help.

sdasgup3 avatar Jan 19 '19 23:01 sdasgup3

  1. encoder does not use att syntax. The encoder uses its own syntax. The new asmparse syntax is closer to a real assembler syntax but still a work in progress. XED’s asmparse uses the Intel-syntax.

  2. personally i would be delighted if the ATT SYSV syntax disappeared. I know that is not practical (linux...), don’t flame me; I can dream... Why? because afaik, there is no actual specification for that syntax variant. (This is where the internet finds the info pages for binutils/gas and points me to them. (please don’t). Or better some old spec from the 1980s...)).

  3. All that said, the size-based suffix-appending algorithm for ATT SYSV syntax in XED is pretty simple and apparently broken in this situation. (see comment above about not having a spec). I guess I will have to look into it...

markcharney avatar Jan 20 '19 00:01 markcharney

Thanks @markcharney

sdasgup3 avatar Jan 20 '19 00:01 sdasgup3

@markcharney Can you help me answer the following questions ? (Sorry if this is not the right thread to post this)

  1. How were ICLASS's in XED assigned? 1.1) Is it exclusive for each instruction variant (memory/register/immediate) and with a specific immediate width (for immediate instructions).
  2. Are there any instances where two instructions with the same ICLASS have radically different behaviour?

sdasgup3 avatar Jan 20 '19 20:01 sdasgup3

For the most part, iclasses are how most of us think about instructions. The rep/lock forms of stuff complicate that simple picture. Aliased encodings also complicate that picture. Did you mean iclass or iform? The xed iform incorporates operand information to try to further disambiguate an encoding. Those are also a defined based on the operand specifications.

If you look at the STTNI instr, the same iclass can do somewhat different stuff depending on the bits in the imm8 operand. And clearly a short REP string op is very different than something that traverses gigabytes. Some x86 instructions (like CALL) are very complicated and can do radically different things. So I'm not really sure what you are asking.

markcharney avatar Jan 22 '19 19:01 markcharney