asmdb
                                
                                
                                
                                    asmdb copied to clipboard
                            
                            
                            
                        confusion around x86 "and" instructions
These two seem to conflict:
["and" , "X:r32/m32, id/ud" , "MI" , "81 /4 id" , "ANY _XLock OF=0 SF=W ZF=W AF=U PF=W CF=0"], ["and" , "X:r64, ud" , "MI" , "81 /4 id" , "X64 _XLock OF=0 SF=W ZF=W AF=U PF=W CF=0"],
This is on purpose - you can encode 64-bit AND with unsigned immediate by not promoting the instruction to 64-bit. Then it's basically the same as the former - it's only possible when the operand is a register though.
Suppose I am looking at this from the perspective of a decode and I encounter a byte sequence that matches
"81 /4 id"
how do I know whether this which "rule" applies. In other words: is this a 32bit or a 64bit instructions.
Maybe this is dependent on the processor mode?
I would also expect that the "or" instruction has similar/symmetric rules but I did not see any.
In case of decode, you should always decode to an original instruction and consider all other aliases as just aliases. The encoder would support the alias (or not, depending on how you see it), but the decoder would always decode to a canonical representation.
OR doesn't have that capability, because it would zero extend the high part of 32-bit reg, which is what AND r64, ud does, but OR r64, ud encoded as 32-bit would essentially do (r64 | ud) & 0xFFFFFFFF
Ah I see. Is there a programmatic way to determine which instructions are "original" . I noticed some instructions have an AltFrom tag but that seems to be something slightly different.
I found another conflict:
  ["and"              , "X:eax, id/ud" , "I"       , "25 id"                        , "ANY AltForm      OF=0 SF=W ZF=W AF=U PF=W CF=0"],
  ["and"              , "X:rax, ud"  , "I"       , "25 id"                        , "X64 AltForm      OF=0 SF=W ZF=W AF=U PF=W CF=0"],
These are the only two such cases I found in the fairly large part of the tables that I process.
This is seems like an odd exception given that this pattern is not repeated with another ALU type instruction.
I spoke to soon. Here is another ambiguity of a slightly different flavor:
    ["movss"            , "w:xmm[31:0], xmm[31:0]"                          , "RM"      , "F3 0F 10 /r"                  , "SSE"],
    ["movss"            , "W:xmm[31:0], m32"                                , "RM"      , "F3 0F 10 /r"                  , "SSE"],
    ["movsd"            , "w:xmm[63:0], xmm[63:0]"                          , "RM"      , "F2 0F 10 /r"                  , "SSE2"],
    ["movsd"            , "W:xmm[63:0], m64"                                , "RM"      , "F2 0F 10 /r"                  , "SSE2"],
                                    
                                    
                                    
                                
Can you describe what is ambiguous in movss / movsd case?
The instructions really do what is described. movss|movsd from memory clears the rest of the register, movss|movsd between registers won't (that's the W vs w). X86 is full of such little differences. You can see this also in AVX case vmovss and vmovsd - there are basically two versions of the instruction depending on whether it has a memory operand or not.
I see. I think the problems is that I am currently mostly focused on the decoding part while asmdb is more focused on encoding.
If I encounter F3 0F 10 xx xx ... I do not know what rule to chose based on only the bytes and the format ("RM").
This is similar to the ambiguity I reported with the "and" instructions further up.
What I have done on my side to deal with this is
- ignore the rules for
 
and              , "X:rax, ud"
and"              "X:r64, ud" , 
- change the movss/movsd rules slightly:
 
movss      "w:xmm[31:0], xmm[31:0]"       "RM"   =>   ......  "Rr"
movss      "w:xmm[31:0], m32"                 "RM" =>     ......  "Rm"
where r = M format restricted to reg; m = M format restricted to m
This gets rid of the ambiguity for me. Not sure if this makes sense for asmdb, though