Instruction RWInfo incomplete/incorrect in some cases
There are some instructions that have hidden operands such as call/push/pop that would read/write the stack pointer. There are also some instructions where I remember the information being not entirely correct.
Heres what I have discovered so far, there might be more.
switch (ins->id())
{
case x86::Inst::kIdSahf:
regsRead.add(x86::rax);
break;
case x86::Inst::kIdImul:
if (ins->opCount() == 1)
{
regsWrite.add(x86::rax);
regsRead.add(x86::rdx);
}
break;
case x86::Inst::kIdRdtscp:
regsWrite.add(x86::rax);
regsWrite.add(x86::rdx);
regsWrite.add(x86::rcx);
break;
case x86::Inst::kIdRdtsc:
regsWrite.add(x86::rax);
regsWrite.add(x86::rdx);
break;
case x86::Inst::kIdDiv:
case x86::Inst::kIdMul:
regsWrite.add(x86::rax);
regsWrite.add(x86::rdx);
regsRead.add(x86::rax);
regsRead.add(x86::rdx);
break;
case x86::Inst::kIdLahf:
regsWrite.add(x86::rax);
regsRead.add(x86::rax);
break;
case x86::Inst::kIdCmpxchg8b:
case x86::Inst::kIdCmpxchg16b:
regsRead.add(x86::rdx);
regsRead.add(x86::rax);
regsRead.add(x86::rcx);
regsRead.add(x86::rbx);
regsWrite.add(x86::rdx);
regsWrite.add(x86::rax);
break;
case x86::Inst::kIdCmpxchg:
regsRead.add(x86::rax);
regsWrite.add(x86::rax);
break;
case x86::Inst::kIdCbw:
case x86::Inst::kIdCwde:
case x86::Inst::kIdCdqe:
regsRead.add(x86::rax);
regsWrite.add(x86::rax);
break;
case x86::Inst::kIdCwd:
case x86::Inst::kIdCdq:
case x86::Inst::kIdCqo:
regsRead.add(x86::rax);
regsWrite.add(x86::rdx);
break;
case x86::Inst::kIdPush:
case x86::Inst::kIdPop:
case x86::Inst::kIdRet:
case x86::Inst::kIdPushfq:
case x86::Inst::kIdPushfd:
case x86::Inst::kIdPushf:
case x86::Inst::kIdPopfq:
case x86::Inst::kIdPopfd:
case x86::Inst::kIdPopf:
regsRead.add(x86::rsp);
regsWrite.add(x86::rsp);
break;
case x86::Inst::kIdLods:
regsWrite.add(x86::rsi);
regsRead.add(x86::rsi);
regsWrite.add(x86::rax);
break;
case x86::Inst::kIdStos:
regsWrite.add(x86::rsi);
regsRead.add(x86::rsi);
regsRead.add(x86::rax);
break;
case x86::Inst::kIdMovs:
regsWrite.add(x86::rsi);
regsWrite.add(x86::rdi);
regsRead.add(x86::rsi);
regsRead.add(x86::rdi);
break;
case x86::Inst::kIdInt:
case x86::Inst::kIdInt3:
case x86::Inst::kIdCpuid:
regsRead.add(x86::rax);
regsRead.add(x86::rcx);
regsWrite.add(x86::rax);
regsWrite.add(x86::rcx);
regsWrite.add(x86::rdx);
regsWrite.add(x86::rbx);
break;
}
// Repeat types
if ((ins->instOptions() & x86::Inst::kOptionRep) || (ins->instOptions() & x86::Inst::kOptionRepne))
{
switch (ins->id())
{
case x86::Inst::kIdStos:
case x86::Inst::kIdLods:
case x86::Inst::kIdCmps:
case x86::Inst::kIdScas:
case x86::Inst::kIdOuts:
case x86::Inst::kIdIns:
case x86::Inst::kIdMovs:
regsWrite.add(x86::rcx);
regsRead.add(x86::rcx);
break;
}
}
Its possible that some things got fixed but most of them still lack the correct information.
I also have this code for flags
if (ins->id() == asmjit::x86::Inst::kIdRcl || ins->id() == asmjit::x86::Inst::kIdRcr)
flagsR &= ~(1 << 11);
So thats probably wrong too, most of this is was tested against Zydis Disassembler.
I think we will really need an API to convert an instruction into some canonical form, and then we will need to add hidden registers to the RW query, because now the query only considers only the operands it sees.
For the Compiler infrastructure the current design works well, because Compiler by itself cannot manage anything that would be hidden, so only explicit forms of instructions work. However, Assembler/Builder interfaces don't have such limitation and anyone can use implicit forms of instructions, which is actually what is the biggest problem in this case.
I'm kinda undecided whether supporting implicit forms was a mistake or whether asmjit should continue supporting such forms, or whether it should automatically make all inputs canonical (explicit) when RW query is used. That doesn't solve hidden registers though, so the API has to be extended in any way.