capstone-rs
capstone-rs copied to clipboard
[WIP] Compile-time Target Architecture Tags
This PR resolves #118, which proposes compile-time target architecture tags. It contains a series of commits that implement the proposal. Problems may arise as I author these commits so I'm opening this PR in an early stage to make sure that we can agree on the specific designs in the end.
Here's a list of unresolved problems:
- [x] Currently, there is no
CapstoneBuilderthat buildsCapstoneinstances whose target architectures are "really" unknown at compile-time. When we call functions likex86, we have explicitly tell that the target architecture isx86. We should implement a new builder that sets the target architecture via a runtimeArchvalue. - [x] Currently, for many supported architectures, their
RegIdandInsnGroupId(some architectures even theInsnId) are simply defined as a bunch ofu32values inside a mod rather than a dedicated enum (e.g. ARM, ARM64, x86, etc.). We’d better re-define them to be dedicated enums so that definingArchTag::RegIdandArchTag::InsnGroupIdis meaningful. - [x] ~After we support compile-time arch tags, many operations need not to return an
OptionorResultany more as they should always succeed given the exact target architecture. But forDynamicArchTagthese operations may still fail so we need to keep their return types as it is now. How do we resolve the conflict?~
Here's a TODO list to make this PR fully ready:
- [x] Update documentation.
- [ ] Update ~examples, benches and~ tests.
Currently, for many supported architectures, their
RegIdandInsnGroupId(some architectures even theInsnId) are simply defined as a bunch ofu32values inside a mod rather than a dedicated enum (e.g. ARM, ARM64, x86, etc.). We’d better re-define them to be dedicated enums so that definingArchTag::RegIdandArchTag::InsnGroupIdis meaningful.
I like your change which uses newtype_enum instead of the constified enum module! It's already a big improvement.
Bindgen didn't supported them when I first did this work. In fact, I contributed the "constified enum module" feature to bindgen (https://github.com/rust-lang/rust-bindgen/pull/741) to avoid the less ergonomic constified enum.
However, I think we may be able to do even better (although it might be more annoying). Ideally, we would expose actual Rust enums in the high-level bindings. That way, users could take advantage of the exhaustive pattern matching with match. The Capstone C code makes sure that 0 is always an "invalid" value such as ARM_SYSREG_INVALID. When converting from the C value to the Rust value, we could map any invalid value to an "invalid" variant of the enum OR always return an Option<T> when converting and return None if we get an invalid value.
// Have internal "invalid" variant
enum ArmReg {
Invalid = 0,
APSR,
APSR_NZCV,
// ...
}
// No invalid variant, so converting gives `Option<ArmReg>`
enum ArmReg {
APSR = 1,
APSR_NZCV,
// ...
}
You don't need to implement this "Rust enum" change as a part of this PR--I just thought I'd let you know what I was thinking.
Ideally, we would expose actual Rust enums in the high-level bindings.
Actually, I tried this at the beginning when writing the second commit. However, I quickly ran into problems.
Take ARM64 registers as an example. The problem is that there are actually 2 underlying C enums that both represent valid ARM64 reg IDs: arm64_reg and arm64_sysreg. On the Rust side, each architecture can only have one Rust enum that represents an architecture register. Unfortunately, bindgen cannot merge two C enums into one big Rust enum. Also converting Arm64Sysreg into Arm64Reg won't work because the underlying value ranges of Arm64Sysreg and Arm64Reg are different. Thus the problem cannot be solved without considerable manual efforts if we generate Rust enums solely.
On the other side, if we generate newtype enums for these two underlying C enums, we can easily convert an Arm64Sysreg into an Arm64Reg by manually implementing a From:
impl From<Arm64Sysreg> for Arm64Reg {
fn from(sysreg: Arm64Sysreg): Self {
Self(sysreg.0)
}
}
Codecov Report
Merging #143 (7d8ab8c) into master (74ccb09) will decrease coverage by
7.49%. Report is 1 commits behind head on master. The diff coverage is91.01%.
:exclamation: Current head 7d8ab8c differs from pull request most recent head 5cd9258. Consider uploading reports for the commit 5cd9258 to get more accurate results
@@ Coverage Diff @@
## master #143 +/- ##
==========================================
- Coverage 93.99% 86.51% -7.49%
==========================================
Files 22 18 -4
Lines 2733 1379 -1354
Branches 2687 1377 -1310
==========================================
- Hits 2569 1193 -1376
- Misses 164 186 +22
| Files | Coverage Δ | |
|---|---|---|
| capstone-rs/examples/demo.rs | 90.38% <100.00%> (ø) |
|
| capstone-rs/examples/parallel.rs | 95.23% <100.00%> (ø) |
|
| capstone-rs/src/arch/evm.rs | 100.00% <ø> (+7.69%) |
:arrow_up: |
| capstone-rs/src/arch/m680x.rs | 93.38% <100.00%> (-2.55%) |
:arrow_down: |
| capstone-rs/src/arch/ppc.rs | 90.32% <100.00%> (-3.13%) |
:arrow_down: |
| capstone-rs/src/arch/tms320c64x.rs | 89.91% <100.00%> (-7.52%) |
:arrow_down: |
| capstone-rs/src/instruction.rs | 97.93% <100.00%> (+5.43%) |
:arrow_up: |
| capstone-rs/src/lib.rs | 100.00% <ø> (ø) |
|
| capstone-rs/src/arch/mips.rs | 67.74% <88.88%> (-28.26%) |
:arrow_down: |
| capstone-rs/src/arch/mod.rs | 89.55% <75.00%> (-3.25%) |
:arrow_down: |
| ... and 7 more |
... and 4 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
On the other side, if we generate newtype enums for these two underlying C enums, we can easily convert an
Arm64Sysreginto anArm64Regby manually implementing aFrom:
Makes sense to me. If we want "handcrafted enums" that are more ergonomic, it would certainly be a separate PR.
I've updated some source code documentation to reflect the new architecture tag. However, I don't add a systematic overview or tutorial about the architecture tag as I'm not sure whether that's necessary. @tmfink Please have a look and see if the documentation can be further improved.
The examples and benches are already up-to-date with the latest API design. I'm checking whether the test coverage can be further improved and after that I belive this PR is ready to be formally reviewed and merged.