Size of immediate values is not exposed in the API
Capstone 4.0.1 (and next@295513f5bc, but I don't have cstool from this revision) reports operand size for immediate values, while information about immediate value size would be more useful.
For example, in case of JMP instruction, the architecture manual specifies "For near jumps in 64-bit mode, the operand size defaults to 64 bits". This is reported as follows:
$ cstool -d x64 'eb 00'
0 eb 00 jmp 2
operands[0].type: IMM = 0x2
operands[0].size: 8
$ cstool -d x64 'e9 00 00 00 00'
0 e9 00 00 00 00 jmp 5
operands[0].type: IMM = 0x5
operands[0].size: 8
In both cases the operand size is 8 bytes and there's no way to distinguish "JMP rel8off" from "JMP rel32off".
Decoding the ADD instruction provides similar results:
$ cstool -d x64 '49 83 c0 02'
0 49 83 c0 02 add r8, 2
operands[1].type: IMM = 0x2
operands[1].size: 8
$ cstool -d x64 '49 81 c0 00 00 00 01'
0 49 81 c0 00 00 00 01 add r8, 0x1000000
operands[1].type: IMM = 0x1000000
operands[1].size: 8
While the reported operand sizes are conforming with the manuals ("The default operand size for most instructions is 32 bits, and a REX prefix must be used to change the operand size to 64 bits.") - REX prefix (49) is used here - it's not possible to tell "ADD reg/mem64, imm8" apart from "ADD reg/mem64, imm32".
From my point of view it's not very interesting to know how the internal operand size is determined by the CPU. Knowledge of how the instruction is encoded would be much more useful, as it has real impact on estimating performance figures, e.g. in case of Skylake there's a difference in execution speed of JMP with 8-bit and 32-bit offsets:
https://uops.info/html-tp/SKL/JMP_Rel8-Measurements.html https://uops.info/html-tp/SKL/JMP_Rel32-Measurements.html
Edit:
I have recently learned I somehow overlooked that you can get the right immediate operand size by using the CsInsn.imm_size field instead of CsInsn.operands[1].size. It seems as though the latter is always just a constant value, whereas the former represents the actual size of the immediate in the instruction.
Original Post:
Bump to this issue. Right now my company has code which requires us to generate Intel-style "instruction patterns" like "MOV r32, imm8" for each instruction, and in the case of the x86_64 instruction sub rsp, 0xC8 (bytes: 48 81 ec c8 00 00 00) Capstone yields the incorrect immediate size for the source operand.
Using the Python API like this:
md = Cs(CS_ARCH_X86, CS_MODE_64)
md.detail = True
disasm_list = list(md.disasm(b'\x48\x81\xec\xc8\x00\x00\x00', 0)
instr = disasm_list[0]
operand_src = instr.operands[1]
print(operand_src.size)
will produce an operand size of 8 for the immediate used in this SUB instruction, meanwhile Zydis (and manually decoding this instruction) yields an operand size of 4 (i.e. 32-bits) for this instruction:
== [ BASIC ] ============================================================================================
MNEMONIC: sub [ENC: DEFAULT, MAP: DEFAULT, OPC: 0x81]
LENGTH: 7
SSZ: 64
EOSZ: 64
EASZ: 64
CATEGORY: BINARY
ISA-SET: I86
ISA-EXT: BASE
EXCEPTIONS: NONE
ATTRIBUTES: HAS_MODRM HAS_REX CPUFLAG_ACCESS
OPTIMIZED: 48 81 EC C8 00 00 00
== [ OPERANDS ] ============================================================================================
## TYPE VISIBILITY ACTION ENCODING SIZE NELEM ELEMSZ ELEMTYPE VALUE
-- --------- ---------- ------ ------------ ---- ----- ------ -------- ---------------------------
0 REGISTER EXPLICIT RW MODRM_RM 64 1 64 INT rsp
1 IMMEDIATE EXPLICIT R SIMM16_32_32 32 1 32 INT [S A 32] 0x00000000000000C8
2 REGISTER HIDDEN W NONE 64 64 1 INT rflags
-- --------- ---------- ------ ------------ ---- ----- ------ -------- ---------------------------
== [ FLAGS ] ============================================================================================
ACTIONS: [CF : M ] [PF : M ] [AF : M ] [ZF : M ] [SF : M ] [OF : M ]
READ: 0x00000000
WRITTEN: 0x000008D5
== [ ATT ] ============================================================================================
ABSOLUTE: sub $0xC8, %rsp
RELATIVE: sub $0xC8, %rsp
== [ INTEL ] ============================================================================================
ABSOLUTE: sub rsp, 0xC8
RELATIVE: sub rsp, 0xC8
== [ SEGMENTS ] ============================================================================================
48 81 EC C8 00 00 00
: : : :..IMM
: : :..MODRM
: :..OPCODE
:..REX
A brutal part of this seems to be that there's no real recourse for if you need to be obtaining the correct size of immediate operands. A good way around this might be if the API allowed you to access the bytes that represent a certain immediate operand, and then you could just use the byte count; but to my knowledge the API does not allow you to see that instruction detail.
Apologies for the ping, but would a maintainer (i.e. @Rot127 ) mind adding tags to this issue so that it can be indexed? I almost made a new issue because this one is over 5 years old.
Added the labels. You might be interested into this discussion about replacing our x86 module internally with Zydis.