capstone-rs icon indicating copy to clipboard operation
capstone-rs copied to clipboard

[WIP] Compile-time Target Architecture Tags

Open Lancern opened this issue 2 years ago • 9 comments

This PR resolves #118, which proposes compile-time target architecture tags. It contains a series of commits that implement the proposal. Problems may arise as I author these commits so I'm opening this PR in an early stage to make sure that we can agree on the specific designs in the end.

Here's a list of unresolved problems:

  • [x] Currently, there is no CapstoneBuilder that builds Capstone instances whose target architectures are "really" unknown at compile-time. When we call functions like x86, we have explicitly tell that the target architecture is x86. We should implement a new builder that sets the target architecture via a runtime Arch value.
  • [x] Currently, for many supported architectures, their RegId and InsnGroupId (some architectures even the InsnId) are simply defined as a bunch of u32 values inside a mod rather than a dedicated enum (e.g. ARM, ARM64, x86, etc.). We’d better re-define them to be dedicated enums so that defining ArchTag::RegId and ArchTag::InsnGroupId is meaningful.
  • [x] ~After we support compile-time arch tags, many operations need not to return an Option or Result any more as they should always succeed given the exact target architecture. But for DynamicArchTag these operations may still fail so we need to keep their return types as it is now. How do we resolve the conflict?~

Here's a TODO list to make this PR fully ready:

  • [x] Update documentation.
  • [ ] Update ~examples, benches and~ tests.

Lancern avatar Aug 06 '23 14:08 Lancern

Currently, for many supported architectures, their RegId and InsnGroupId (some architectures even the InsnId) are simply defined as a bunch of u32 values inside a mod rather than a dedicated enum (e.g. ARM, ARM64, x86, etc.). We’d better re-define them to be dedicated enums so that defining ArchTag::RegId and ArchTag::InsnGroupId is meaningful.

I like your change which uses newtype_enum instead of the constified enum module! It's already a big improvement.

Bindgen didn't supported them when I first did this work. In fact, I contributed the "constified enum module" feature to bindgen (https://github.com/rust-lang/rust-bindgen/pull/741) to avoid the less ergonomic constified enum.

However, I think we may be able to do even better (although it might be more annoying). Ideally, we would expose actual Rust enums in the high-level bindings. That way, users could take advantage of the exhaustive pattern matching with match. The Capstone C code makes sure that 0 is always an "invalid" value such as ARM_SYSREG_INVALID. When converting from the C value to the Rust value, we could map any invalid value to an "invalid" variant of the enum OR always return an Option<T> when converting and return None if we get an invalid value.

// Have internal "invalid" variant
enum ArmReg {
    Invalid = 0,
    APSR,
    APSR_NZCV,
    // ...
}
// No invalid variant, so converting gives `Option<ArmReg>`
enum ArmReg {
    APSR = 1,
    APSR_NZCV,
    // ...
}

You don't need to implement this "Rust enum" change as a part of this PR--I just thought I'd let you know what I was thinking.

tmfink avatar Aug 21 '23 05:08 tmfink

Ideally, we would expose actual Rust enums in the high-level bindings.

Actually, I tried this at the beginning when writing the second commit. However, I quickly ran into problems.

Take ARM64 registers as an example. The problem is that there are actually 2 underlying C enums that both represent valid ARM64 reg IDs: arm64_reg and arm64_sysreg. On the Rust side, each architecture can only have one Rust enum that represents an architecture register. Unfortunately, bindgen cannot merge two C enums into one big Rust enum. Also converting Arm64Sysreg into Arm64Reg won't work because the underlying value ranges of Arm64Sysreg and Arm64Reg are different. Thus the problem cannot be solved without considerable manual efforts if we generate Rust enums solely.

On the other side, if we generate newtype enums for these two underlying C enums, we can easily convert an Arm64Sysreg into an Arm64Reg by manually implementing a From:

impl From<Arm64Sysreg> for Arm64Reg {
    fn from(sysreg: Arm64Sysreg): Self {
        Self(sysreg.0)
    }
}

Lancern avatar Aug 21 '23 12:08 Lancern

Codecov Report

Merging #143 (7d8ab8c) into master (74ccb09) will decrease coverage by 7.49%. Report is 1 commits behind head on master. The diff coverage is 91.01%.

:exclamation: Current head 7d8ab8c differs from pull request most recent head 5cd9258. Consider uploading reports for the commit 5cd9258 to get more accurate results

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
- Coverage   93.99%   86.51%   -7.49%     
==========================================
  Files          22       18       -4     
  Lines        2733     1379    -1354     
  Branches     2687     1377    -1310     
==========================================
- Hits         2569     1193    -1376     
- Misses        164      186      +22     
Files Coverage Δ
capstone-rs/examples/demo.rs 90.38% <100.00%> (ø)
capstone-rs/examples/parallel.rs 95.23% <100.00%> (ø)
capstone-rs/src/arch/evm.rs 100.00% <ø> (+7.69%) :arrow_up:
capstone-rs/src/arch/m680x.rs 93.38% <100.00%> (-2.55%) :arrow_down:
capstone-rs/src/arch/ppc.rs 90.32% <100.00%> (-3.13%) :arrow_down:
capstone-rs/src/arch/tms320c64x.rs 89.91% <100.00%> (-7.52%) :arrow_down:
capstone-rs/src/instruction.rs 97.93% <100.00%> (+5.43%) :arrow_up:
capstone-rs/src/lib.rs 100.00% <ø> (ø)
capstone-rs/src/arch/mips.rs 67.74% <88.88%> (-28.26%) :arrow_down:
capstone-rs/src/arch/mod.rs 89.55% <75.00%> (-3.25%) :arrow_down:
... and 7 more

... and 4 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Sep 17 '23 03:09 codecov[bot]

On the other side, if we generate newtype enums for these two underlying C enums, we can easily convert an Arm64Sysreg into an Arm64Reg by manually implementing a From:

Makes sense to me. If we want "handcrafted enums" that are more ergonomic, it would certainly be a separate PR.

tmfink avatar Oct 16 '23 05:10 tmfink

I've updated some source code documentation to reflect the new architecture tag. However, I don't add a systematic overview or tutorial about the architecture tag as I'm not sure whether that's necessary. @tmfink Please have a look and see if the documentation can be further improved.

The examples and benches are already up-to-date with the latest API design. I'm checking whether the test coverage can be further improved and after that I belive this PR is ready to be formally reviewed and merged.

Lancern avatar Oct 16 '23 15:10 Lancern