capstone-rs
capstone-rs copied to clipboard
regs_write() and regs_read() do not return complete results
The instruction detail provided via InsnDetail
's regs_write()
and regs_read()
only return a subset of the registers written to or read from.
For example, running the C-based cstool
:
$ cstool -d thumb "f0bd"
0 f0 bd pop {r4, r5, r6, r7, pc}
ID: 423 (pop)
op_count: 5
operands[0].type: REG = r4
operands[0].access: WRITE
operands[1].type: REG = r5
operands[1].access: WRITE
operands[2].type: REG = r6
operands[2].access: WRITE
operands[3].type: REG = r7
operands[3].access: WRITE
operands[4].type: REG = pc
operands[4].access: WRITE
Registers read: sp
Registers modified: sp r4 r5 r6 r7 pc
Groups: thumb thumb1only
In contrast, running the capstone-rs-based cstool
:
$ cstool -d --arch arm --mode thumb --code "f0bd"
1000: f0 bd pop {r4, r5, r6, r7, pc}
insn id: 423
read regs: sp
write regs: sp
insn groups: thumb, thumb1only
The problem -- alluded to in #63 -- is that the C-based consumers are explicitly calling cs_regs_access()
while the Rust-based consumers have no way of knowing that this must be called. There are a number of ways of fixing this, but it seems that calling insn_detail()
should return an InsnDetail
for which regs_write()
/regs_read()
return correct data -- and therefore that insn_detail()
should explicitly make the call to cs_regs_access()
.
Here is a diff that implements this:
diff --git a/capstone-rs/src/capstone.rs b/capstone-rs/src/capstone.rs
index 9e77f9d..1ae313d 100644
--- a/capstone-rs/src/capstone.rs
+++ b/capstone-rs/src/capstone.rs
@@ -374,6 +374,53 @@ impl Capstone {
} else if Self::is_diet() {
Err(Error::IrrelevantDataInDiet)
} else {
+ /*
+ * To assure that our returned InsnDetail's regs_*() accessors
+ * return the correct data, we need to call into cs_regs_access().
+ * This interface is unsafe even by C's own poor standards for
+ * itself: it takes no bounds and has essentially undefined
+ * behavior if the arrays passed in correspond to the (same-sized)
+ * arrays in the detail member of the cs_insn. We therefore call
+ * into this interface with a scratch buffer on our stack and then
+ * manually copy it into the same structures in our underlying
+ * insn. This could presumably be made less verbose, but given
+ * the degree of unsafety, we have erred on the side of caution...
+ */
+ unsafe {
+ let mut rbuffer: [u16; 32] = [0; 32];
+ let mut wbuffer: [u16; 32] = [0; 32];
+ let mut rcnt: u8 = 0;
+ let mut wcnt: u8 = 0;
+ let mut detail = insn.insn.detail;
+
+ let regs_read: *mut u16 = &mut rbuffer[0];
+ let regs_write: *mut u16 = &mut wbuffer[0];
+ let regs_read_count: *mut u8 = &mut rcnt;
+ let regs_write_count: *mut u8 = &mut wcnt;
+
+ let err = cs_regs_access(
+ self.csh(), &insn.insn,
+ regs_read, regs_read_count,
+ regs_write, regs_write_count
+ );
+
+ if err != cs_err::CS_ERR_OK {
+ return Err(err.into());
+ }
+
+ for i in 0..rcnt {
+ (*detail).regs_read[i as usize] = rbuffer[i as usize];
+ }
+
+ (*detail).regs_read_count = rcnt;
+
+ for i in 0..wcnt {
+ (*detail).regs_write[i as usize] = wbuffer[i as usize];
+ }
+
+ (*detail).regs_write_count = wcnt;
+ }
+
Ok(unsafe { insn.detail(self.arch) })
}
}
Running with this diff, the capstone-rs-based cstool
has the correct results:
$ cstool -d --arch arm --mode thumb --code "f0bd"
1000: f0 bd pop {r4, r5, r6, r7, pc}
insn id: 423
read regs: sp
write regs: sp, r4, r5, r6, r7, pc
insn groups: thumb, thumb1only
Thanks for filing the issue! As you pointed out, this will mostly be solved once we fix #68.
It seems like you probably have the right idea that we should always call cs_regs_access
when in detail mode. However, cs_regs_access
only seems to be supported on x86, arm, and arm64, so we don't want to error out if a call fails (just ignore).
Just hit this. Are you planning on fixing this? If you are busy, is the fix something I can help with?
@vn-ki I did not make any plans to implement this myself. I would greatly appreciate help with this! Would you like to open a PR? I would be happy to help with any questions you have.
The reason I have not gotten to this is that the support is limited to only a few architectures, which potentially complicates the implementation. Some open questions:
- Should
regs_read()
/regs_write()
always callcs_regs_access
? - How should we manage the lifetimes? Should we just copy into Rust-managed memory?
If you open a PR, please reference this issue (#84) and the original #63.
@tmfink I'll be happy to open a PR.
Some ideas:
- We can keep the current API (and approach), and add two new fields to
capstone::InsnDetail
corresponding to regs_read and regs_write. Depending on the architecture, we can return the one fromcs_detail
or the one fromcs_regs_access
. The initialization of these new two arrays would be at the time of construction when callingcapstone::Capstone::insn_detail
. - We can do what the other bindings does and expose an extra
Capstone::reg_access(&Insn)
function which will call thecs_regs_access
and return aInsnRegs(CapstoneRegs, CapstoneRegs)
(lifetime shouldn't be an issue here since we are returning two fixed length arrays. Unless we want the regs to not live longer than the instruction). This is what the python binding seem to be doing, see details api and regs_access API.
I think approach 1 would be less confusing for the end user while approach 2 would be more inline with what upstream thinks is idiomatic (which might mean less headaches with future updates). Also with approach 2, maybe the user will have less of a headache if they want these regs to outlive the instruction?
Tell me if you think of any of these ones are good or if you had some other ideas?
@vn-ki Thanks for the proposals. My thoughts after thinking for a few days:
- Approach 1: transparent to user
- easier for user
- requires extra run-time checks based on whether the architecture supports
cs_regs_access()
when calling
- Approach 2: explicit
cs_regs_access()
API- annoying for user (need to manually track which arch supports the extra API)
- closer to what other bindings (including C API)
- Same for both
- Need extra storage for regs info, probably equivalent of 2 pointers either in
Capstone
orInsnDetail
- Need extra storage for regs info, probably equivalent of 2 pointers either in
I think I favor approach 1. We already bend over backwards to adapt to Capstone's "interesting" C API.
Some notes when implementing:
- For storage for the
regs_read
/regs_write
, I suggest using aOption<Box<cs_regs>>
- This won't require any special drop logic since Rust will own the allocation.
- You only need to initialize the
Option
when one ofInsnDetail::{regs_read, regs_write}
are called. - It's probably not worth using
MaybeUnit
to avoid the "zeroizing" cost; initialization only happens once perCapstone
instance
Hi, is @vn-ki still working on this?