x86reference icon indicating copy to clipboard operation
x86reference copied to clipboard

Redundant second opcode values instead of mod attribute

Open Kashio opened this issue 6 years ago • 9 comments

There're couple of entries in the reference that use the sec_opcd attribute instead of just using the mod attribute which making automatic parsing a bit more cumbersome since there're no need for the sec_opcd at all. For example FCOM instruction (opcode D8 and opcode extension 2) has 2 distinct entries instead of 2 syntaxes. The second entry indicating that the sec_opcd D1 must be matched but it's redundant since it could've been written exactly like the FADD instruction (opcode D8 and opcode extensions 0) with 2 syntaxes one indicating mod memory and the second indicating mod register. I could've written the FADD instruction in the same format with 2 entries where the second one has a sec_opcd with value of C0 (mod bits set to 11 and opcode extension set to 000).

Kashio avatar Feb 04 '19 19:02 Kashio

Thanks for pointing this out. The FCOM without operands must be exactly the opcode D8D1 (yes, the opcode extension 2 is redundant here). In other words, whenever you disassemble FCOM ST, ST1, it should be disassembled to FCOM without operands in strict Intel syntax. Given the way the reference is structured, it is not possible to express this using just another syntax inside the same entry.

If you look up FADD in Intel manual, there is no form of FADD without operands. But there is FADDP without operands.

I think I was overly obsessive to follow the Intel syntax that is just... weird and erratic.

If you work with the XML reference, I would always suggest to transform it to the structure that suits you better, for example to remove all these "non operand" entries that noone really cares of.

BarebitOpenSource avatar Feb 04 '19 22:02 BarebitOpenSource

Can't you do 2 syntaxes like the FADD like so:

<syntax mod="mem">
    <mnem>FCOM</mnem>
    <src nr="0" group="x87fpu" displayed="no">ST</src>
    <src>
        <a>ES</a>
        <t>sr</t>
    </src>
</syntax>
<syntax mod="nomem">
    <mnem>FCOM</mnem>
    <src nr="0" group="x87fpu" displayed="no">ST</src>
    <src nr="1" group="x87fpu" address="EST" displayed="no">ST1</src>
</syntax>

I mean The sec_opcd D1 is redundant in the sense that if I have the second syntax as nomem I know the modrm byte mod field should be 11 and with the opcode extension field set to 2 I know the reg field should be 010 and combining this to complete byte 11010000 - D0 meaning the nomem syntax starting from D0 all the way up to D7. I had all the information I need to get this range value from D0 to D7 without the sec_opcd field. A complete rewrite of FCOM entry would look like so:

<entry mem_format="00">
<opcd_ext>2</opcd_ext>
<syntax mod="mem">
    <mnem>FCOM</mnem>
    <src nr="0" group="x87fpu" displayed="no">ST</src>
    <src>
        <a>ES</a>
        <t>sr</t>
    </src>
</syntax>
<syntax mod="nomem">
    <mnem>FCOM</mnem>
    <src nr="0" group="x87fpu" displayed="no">ST</src>
    <src nr="1" group="x87fpu" address="EST" displayed="no">ST1</src>
</syntax>
<grp1>x87fpu</grp1>
<grp2>compar</grp2>
<modif_f_fpu>0123</modif_f_fpu>
<def_f_fpu>0123</def_f_fpu>
<note>
<brief>Compare Real</brief>
</note>
</entry>

and it still complies to the strict no operands for the nomem syntax since I left the displayed attribute set to no, so a parser of the XML reference should take this into account

Kashio avatar Feb 05 '19 22:02 Kashio

If I remember well, you're right, but I need to check it once more.

BarebitOpenSource avatar Sep 23 '21 07:09 BarebitOpenSource

I actually checked it again and I think I'm still correct 🙃 The sec_opcd is not only redundant but also limiting, because if a parser would assume it to be correct then it should be obvious the instruction is only useable with this sec_opcd but actually it can be any byte in the range of D0-D7 for the ModR/M register variant. I would also be thinking of changing the src operand: <src nr="1" group="x87fpu" address="EST" displayed="no">ST1</src> To something more general like: <src nr="i" group="x87fpu" address="EST" displayed="no">STi</src> If you don't have time to fix it I'll probably open PR soon addressing some of the issues here. I'm probably going to support AVX and other extensions not in the XML yet in some point.

Kashio avatar Sep 23 '21 08:09 Kashio

As for the AVX, I need to add it too, and you seem to understand the XML structure and the rules quite well. Could we collaborate on AVX? I think it would be better to discuss it in private. If you're interested, my primary email is [email protected].

BarebitOpenSource avatar Sep 23 '21 08:09 BarebitOpenSource

As for the AVX, I need to add it too, and you seem to understand the XML structure and the rules quite well. Could we collaborate on AVX? I think it would be better to discuss it in private. If you're interested, my primary email is [email protected].

Sent you an email it's pretty long 🥳

Kashio avatar Sep 23 '21 12:09 Kashio

If I remember well, you're right, but I need to check it once more.

I've checked it again and I'm now wondering why even 2 syntaxes are necessary? Can't we do 1 syntax where the address method is ES and the operand type is sr? No need for mod="mem" mod="nomem" attributes since the instruction support both memory addressing as-well as register addressing. Same applies for more instructions in these group, like FADD which I've stated use the mod="mem" mod="nomem" attributes, but for what reason, it's better of just use 1 syntax with ES and sr as it support both addressing as-well. Am I wrong here?

Kashio avatar Sep 24 '21 22:09 Kashio

I looked into it finally. The answer to the first question is that you're right, two entries are not needed.

The answer to the second question: we need to indicate the specific syntax for FCOM ST, ST1 that is FCOM. This can't be expressed using just one syntax element.

BarebitOpenSource avatar Sep 26 '21 15:09 BarebitOpenSource

I looked into it finally. The answer to the first question is that you're right, two entries are not needed.

The answer to the second question: we need to indicate the specific syntax for FCOM ST, ST1 that is FCOM. This can't be expressed using just one syntax element.

You're correct I've missed that the memory syntax second operand type is sr 32 bits, while the register syntax second operand type is, well there's not type specified but it is an ST(i) register, so it's 80 bits. But I think more importantly, the instruction in it's register addressing should support comparison of ST(0) to any of the ST(i) registers and not only ST(1): FCOM D8

D8 D0+i | FCOM ST(i) | Compare ST(0) with ST(i).

So like I've previously said, I think in such cases the register addressing entry, second operand should be changed from: <src nr="1" group="x87fpu" address="EST" displayed="no">ST1</src> to something like: <src nr="i" group="x87fpu" address="EST" displayed="no">ST(i)</src>

The entry should be:

<entry mem_format="00">
<opcd_ext>2</opcd_ext>
<syntax mod="mem">
    <mnem>FCOM</mnem>
    <src nr="0" group="x87fpu" displayed="no">ST</src>
    <src>
        <a>ES</a>
        <t>sr</t>
    </src>
</syntax>
<syntax mod="nomem">
    <mnem>FCOM</mnem>
    <src nr="0" group="x87fpu" displayed="no">ST</src>
    <src nr="i" group="x87fpu" address="EST" displayed="no">ST(i)</src>
</syntax>
<grp1>x87fpu</grp1>
<grp2>compar</grp2>
<modif_f_fpu>0123</modif_f_fpu>
<def_f_fpu>0123</def_f_fpu>
<note>
<brief>Compare Real</brief>
</note>
</entry>

I think this also applies to FCOMP D8 /3 and maybe some other instructions I'm not really thinking about, what do you say? EDIT: After some more digging in the XML I think pretty much anywhere address EST is used, it's for an instruction that could potentially reference any ST(i) register, although the text of the element explicitly state ST1. There are some instructions like FLY2X where an explicit ST(i) register needs to use like ST1 and it doesn't use the EST addressing and just use the nr=1 attribute with ST1 in the text of the element like so:

<dst nr="1" group="x87fpu" displayed="no">ST1</dst>
<!--  no @address="EST"!  -->

So I think an appropriate change for these EST addressing instruction would be, to either change it like I've suggested before: <src nr="i" group="x87fpu" address="EST" displayed="no">ST(i)</src> or we can drop the nr attribute and the element text all together like so: <src group="x87fpu" address="EST" displayed="no"/>

Kashio avatar Sep 27 '21 14:09 Kashio