capstone icon indicating copy to clipboard operation
capstone copied to clipboard

Size of immediate values is not exposed in the API

Open wolfpld opened this issue 5 years ago • 2 comments

Capstone 4.0.1 (and next@295513f5bc, but I don't have cstool from this revision) reports operand size for immediate values, while information about immediate value size would be more useful.

For example, in case of JMP instruction, the architecture manual specifies "For near jumps in 64-bit mode, the operand size defaults to 64 bits". This is reported as follows:

$ cstool -d x64 'eb 00'
 0  eb 00                                            jmp        2
                operands[0].type: IMM = 0x2
                operands[0].size: 8
$ cstool -d x64 'e9 00 00 00 00'
 0  e9 00 00 00 00                                   jmp        5
                operands[0].type: IMM = 0x5
                operands[0].size: 8

In both cases the operand size is 8 bytes and there's no way to distinguish "JMP rel8off" from "JMP rel32off".

Decoding the ADD instruction provides similar results:

$ cstool -d x64 '49 83 c0 02'
 0  49 83 c0 02                                      add        r8, 2
                operands[1].type: IMM = 0x2
                operands[1].size: 8
$ cstool -d x64 '49 81 c0 00 00 00 01'
 0  49 81 c0 00 00 00 01                             add        r8, 0x1000000
                operands[1].type: IMM = 0x1000000
                operands[1].size: 8

While the reported operand sizes are conforming with the manuals ("The default operand size for most instructions is 32 bits, and a REX prefix must be used to change the operand size to 64 bits.") - REX prefix (49) is used here - it's not possible to tell "ADD reg/mem64, imm8" apart from "ADD reg/mem64, imm32".

From my point of view it's not very interesting to know how the internal operand size is determined by the CPU. Knowledge of how the instruction is encoded would be much more useful, as it has real impact on estimating performance figures, e.g. in case of Skylake there's a difference in execution speed of JMP with 8-bit and 32-bit offsets:

https://uops.info/html-tp/SKL/JMP_Rel8-Measurements.html https://uops.info/html-tp/SKL/JMP_Rel32-Measurements.html

wolfpld avatar Apr 27 '20 00:04 wolfpld

Edit: I have recently learned I somehow overlooked that you can get the right immediate operand size by using the CsInsn.imm_size field instead of CsInsn.operands[1].size. It seems as though the latter is always just a constant value, whereas the former represents the actual size of the immediate in the instruction.

Original Post: Bump to this issue. Right now my company has code which requires us to generate Intel-style "instruction patterns" like "MOV r32, imm8" for each instruction, and in the case of the x86_64 instruction sub rsp, 0xC8 (bytes: 48 81 ec c8 00 00 00) Capstone yields the incorrect immediate size for the source operand.

Using the Python API like this:

md = Cs(CS_ARCH_X86, CS_MODE_64)
md.detail = True
disasm_list = list(md.disasm(b'\x48\x81\xec\xc8\x00\x00\x00', 0)
instr = disasm_list[0]
operand_src = instr.operands[1]
print(operand_src.size)

will produce an operand size of 8 for the immediate used in this SUB instruction, meanwhile Zydis (and manually decoding this instruction) yields an operand size of 4 (i.e. 32-bits) for this instruction:

== [    BASIC ] ============================================================================================
   MNEMONIC: sub [ENC: DEFAULT, MAP: DEFAULT, OPC: 0x81]
     LENGTH:  7
        SSZ: 64
       EOSZ: 64
       EASZ: 64
   CATEGORY: BINARY
    ISA-SET: I86
    ISA-EXT: BASE
 EXCEPTIONS: NONE
 ATTRIBUTES: HAS_MODRM HAS_REX CPUFLAG_ACCESS
  OPTIMIZED: 48 81 EC C8 00 00 00

== [ OPERANDS ] ============================================================================================
##       TYPE  VISIBILITY  ACTION      ENCODING   SIZE  NELEM  ELEMSZ  ELEMTYPE                        VALUE
--  ---------  ----------  ------  ------------   ----  -----  ------  --------  ---------------------------
 0   REGISTER    EXPLICIT      RW      MODRM_RM     64      1      64       INT                          rsp
 1  IMMEDIATE    EXPLICIT       R  SIMM16_32_32     32      1      32       INT  [S A 32] 0x00000000000000C8
 2   REGISTER      HIDDEN       W          NONE     64     64       1       INT                       rflags
--  ---------  ----------  ------  ------------   ----  -----  ------  --------  ---------------------------

== [    FLAGS ] ============================================================================================
    ACTIONS: [CF  : M  ] [PF  : M  ] [AF  : M  ] [ZF  : M  ] [SF  : M  ] [OF  : M  ]
       READ: 0x00000000
    WRITTEN: 0x000008D5

== [      ATT ] ============================================================================================
   ABSOLUTE: sub $0xC8, %rsp
   RELATIVE: sub $0xC8, %rsp

== [    INTEL ] ============================================================================================
   ABSOLUTE: sub rsp, 0xC8
   RELATIVE: sub rsp, 0xC8

== [ SEGMENTS ] ============================================================================================
48 81 EC C8 00 00 00
:  :  :  :..IMM
:  :  :..MODRM
:  :..OPCODE
:..REX

A brutal part of this seems to be that there's no real recourse for if you need to be obtaining the correct size of immediate operands. A good way around this might be if the API allowed you to access the bytes that represent a certain immediate operand, and then you could just use the byte count; but to my knowledge the API does not allow you to see that instruction detail.

Apologies for the ping, but would a maintainer (i.e. @Rot127 ) mind adding tags to this issue so that it can be indexed? I almost made a new issue because this one is over 5 years old.

calware avatar Sep 04 '25 07:09 calware

Added the labels. You might be interested into this discussion about replacing our x86 module internally with Zydis.

Rot127 avatar Sep 04 '25 12:09 Rot127