capstone icon indicating copy to clipboard operation
capstone copied to clipboard

Increased size of cs_isn.bytes to 33 to be able to hold big EVM instructions

Open f0rki opened this issue 7 years ago • 12 comments

The EVM arch has a couple of really big instructions, with the biggest one being PUSH32 (with 33 bytes size). Currently cs_inst can only hold 16 bytes of raw instruction data, which is not enough for EVM. This results in the bytes being truncated to 16 bytes. This didn't really affect the disassembly or the op_str, but if you use cs_inst.size or cs_inst.bytes afterwards, it would return wrong data for those big instructions (PUSH15 up to PUSH32).

f0rki avatar Aug 08 '18 14:08 f0rki

33 looks weird though, should we round the size up to some other number?

aquynh avatar Aug 09 '18 15:08 aquynh

I don't know. It already feels like a lot of wasted memory for all the other archs. Maybe it should also be surrounded by #ifdef HAS_EVM and fall back to the 16 byte size?

f0rki avatar Aug 09 '18 19:08 f0rki

I don't know. It already feels like a lot of wasted memory for all the other archs. Maybe it should also be surrounded by #ifdef HAS_EVM and fall back to the 16 byte size?

it's a good idea, can you do this continue?

kabeor avatar Nov 10 '21 12:11 kabeor

  • master branch is not really used, new development should go into next branch (#1759)
  • Changing the size of a field in cs_insn is an ABI break, meaning a change like this should only go into next branch (and not backported to older branches)
    • As a consequence, I don't think we want the memory layout of cs_insn to change as a result of something like HAS_EVM. Otherwise, the ABI used in a libcapstone.so would depend on whether the library was compiled with HAS_EVM. Any headers distributed headers and and all language bindings would need to know how the library was compiled.

tmfink avatar Nov 11 '21 06:11 tmfink

yes, plz rebase your branch to 'next'. 'master' is not accept to PR

  • As a consequence, I don't think we want the memory layout of cs_insn to change as a result of something like HAS_EVM. Otherwise, the ABI used in a libcapstone.so would depend on whether the library was compiled with HAS_EVM. Any headers distributed headers and and all language bindings would need to know how the library was compiled.

@aquynh any thoughts?

kabeor avatar Nov 11 '21 06:11 kabeor

I rebased to next.

f0rki avatar Nov 11 '21 13:11 f0rki

i hesitate on this PR since it would break the API :-(

aquynh avatar Nov 11 '21 15:11 aquynh

cs_isn.bytes size is internal only no ? why it would break the API ?

sylvainpelissier avatar Jan 31 '22 17:01 sylvainpelissier

Alternatively: Would it make sense to move the opcode constant to the cs_evm struct?

f0rki avatar Feb 01 '22 09:02 f0rki

cs_isn.bytes size is internal only no ? why it would break the API ?

cs_insn is in capstone.h, which is a public header. Changing the bytes field changes the size of cs_insn which is an ABI break.

tmfink avatar Feb 01 '22 19:02 tmfink

Will it make more sense to declare the field bytes to the very end of struct cs_insn as a dynamic-length array? Something like:

struct cs_insn {
  // other fields..
  uint32_t len_bytes; // length of "bytes"
  uint8_t bytes[];
};

Or alternatively declaring bytes as a pointer just like the detail field.

The length changing of bytes was the reason that Capstone has to bump from 4 to 5 (see https://github.com/capstone-engine/capstone/issues/1315#issuecomment-458441705), maybe it's better to consider a solution instead of bumping major versions because of "minor" fixes.

david942j avatar Mar 27 '22 16:03 david942j

The length changing of bytes was the reason that Capstone has to bump from 4 to 5 (see #1315 (comment)), maybe it's better to consider a solution instead of bumping major versions because of "minor" fixes.

Adding an extra pointer indirection/allocation is less efficient but could help maintain ABI compatibility. It depends on what we want to do.

tmfink avatar Mar 28 '22 09:03 tmfink

Thank you for the PR! I closed it because it is out of date. With the new auto-sync update for v6 we made many changes to some main architectures and will do also to others. This also changed the requirements we have now for new PRs.

If you still want to merge the changes, please rebase your fix onto the newest next branch and open a new PR.

Rot127 avatar Mar 20 '24 09:03 Rot127