rust-bindgen
rust-bindgen copied to clipboard
Support for zerocopy
I am looking to use the custom derive callbacks API to add zerocopy FromBytes
and AsBytes
derives to generated types. I have to rely on knowing exactly which types should derive these traits safely, as structs with pointers and certain unions cannot derive AsBytes
and FromBytes
.
Is there interest in building deeper support for zerocopy, since it is a useful trait for interacting with C APIs (mmap file, read struct). I don't know how "standard" zerocopy is within the ecosystem though.
chiming in to mention I'd also be very interested in seeing this implemented.
until Rust gets support for language-level safe transmute, being able to hook into the various third-party safe transmute ecosystems (e.g: zerocopy
, bytemuck
, etc...) would be super valuable in bindgen'd code.
Hi @adamlesinski and @daprilik. I've checked both zerocopy and bytemuck as I've never used them before. I wrote a script to find out how many crates use both bindgen and zerocopy/bytemuck and found that from the ~1200 crates depending on bindgen, only 2 also depend on bytemuck. This means that providing explicit support for zerocopy or bytemuck would be a slippery slope as it would serve as a precedent to include additional support for several other crates with custom derives. All of this is based in the fact that I got this info right so if this is not the case let me know.
We could try to find an alternative solution providing a more precise way to decide which types have custom derives and which ones do not. However, I'm failing to see how such API would look.
I agree that taking a direct dependency on either of the crates wouldn't be the best idea - the safe transmute ecosystem continues to be in-flux, so best not to bet on any particular horse just yet.
Rather, I would argue that what we really want here is a mechanism for bindgen to allow conditionally adding a derive based on whether or not a type supports being safely transmuted (i.e: doesn't contain any non-POD types, has appropriate padding bytes, etc...). That's all info that (I think?) bindgen already has, so while it might require a bit of plumbing, the resulting API could be agnostic wrt. which specific library / derive macro end-users would want to use.
Interesting. I think bindgen should be able to provide such information. I see a few things we would need to figure out first:
- There might be some trouble with opaque types as they are exposed as byte-arrays by bindgen but could contain non-pod values. For example, if a type with a pointer field is marked as opaque, should it be considered pod or not?
- Is
repr
really important? According to zerocopy it is not, but bytemuck requires explicitly#[repr(C)]
or#[repr(transparent)]
. So either option could be seen as bindgen picking sides.
cc @emilio and @kulp
- There might be some trouble with opaque types as they are exposed as byte-arrays by bindgen but could contain non-pod values. For example, if a type with a pointer field is marked as opaque, should it be considered pod or not?
Do you have example of something like this (just so I know what you're talking about)?
Is opaque-ness a property introduced by bindgen itself? If so, does bindgen know the actual layout of the underlying type? If so, I would imagine it could still determine if the type is POD / not POD?
- Is
repr
really important? According to zerocopy it is not
This might be a documentation gap, as if you look under-the-hood of the FromBytes
derive, I'm fairly sure it also validates repr
s.
Might also be worth digging in to what the official safe-transmute working group's draft guidelines are here
Do you have example of something like this (just so I know what you're talking about)?
Is opaque-ness a property introduced by bindgen itself? If so, does bindgen know the actual layout of the underlying type? If so, I would imagine it could still determine if the type is POD / not POD?
Sure, the guide explains this better than me but the short story is that if a c/c++ type is not translatable to a rust type, it can me marked as opaque and bindgen produces a blob of bytes with the same layout (this might be only size, I'm not sure). In principle such type could have a pointer inside it as any type could be tagged as opaque. Meaning that it's layout would be that of an usize
but not be actually POD.
This might be a documentation gap, as if you look under-the-hood of the
FromBytes
derive, I'm fairly sure it also validatesrepr
s.
afaik repr
for enumerations is important but not for structs. see Why isn’t an explicit representation required for structs?. This also coincides with the fact that the link you send is inside the derive_from_bytes_enum
function.
Might also be worth digging in to what the official safe-transmute working group's draft guidelines are here
Already chatting with them :grin:. I'll keep this up to date with whatever information I can gather from them.
So after being illuminated by the safe-transmute working group it seems we can do this but there's a caveat. Types with private fields shouldn't be treated as POD. Which means that opaque types cannot be treated as POD either.
Apparently zerocopy
does not ask a repr
on structs because you also have ToBytes
which does. So both crates agree on this. And bindgen needs to emit a specific repr
for their types to be FFI safe so there's that.
So yeah, it seems we have a more or less clear criteria to decide if a type can be treated as POD or not. I think adding an extra parameter to add_derives
containing this information would be reasonable. Something like
#[non_exhaustive]
pub struct TypeInfo {
pub is_pod: bool,
}
so we can easily add more info if required in the future.
Thoughts @emilio?
Now that https://github.com/rust-lang/rust-bindgen/pull/2355 has been merged, we can move forward towards collecting information from each type to decide if can be treated as POD or not.
Any further plan for this support?
:wave: It is planned in the "near" future. I'm working on another PR that also involves the custom derive callback so I cannot start working on this without breaking the other PR. Once I'm done with it I'll give this proper attention.
With --with-derive-custom
in CLI(backed by add_derives
), it's easy to add derive from zerocopy
. However, it doesn't work well bitfields for 2 reasons:
- The generated bitfield generic struct is
repr(C)
, butzerocopy
requiresrepr(transparent)
for generic struct. - The generated bitfield struct doesn't have the custome derive.
CLI command
bindgen .\f.h --allowlist-type SS --rust-target 1.68 --with-derive-custom=".*=zerocopy::AsBytes,zerocopy::FromBytes,zerocopy::FromZeroes" > o.rs
The C header
#include "inttypes.h"
#define X_WIDTH 28
#define Y_WIDTH 4
#pragma pack(push, 1)
typedef struct
{
struct {
uint32_t x:X_WIDTH;
uint32_t y:Y_WIDTH;
} s;
} SS;
#pragma pack(pop)
Generated Rust source:
/* automatically generated by rust-bindgen 0.68.1 */
#[repr(C)]
#[derive(Copy, Clone, Debug, Default, Eq, Hash, Ord, PartialEq, PartialOrd)]
pub struct __BindgenBitfieldUnit<Storage> {
storage: Storage,
}
impl<Storage> __BindgenBitfieldUnit<Storage> {
#[inline]
pub const fn new(storage: Storage) -> Self {
Self { storage }
}
}
impl<Storage> __BindgenBitfieldUnit<Storage>
where
Storage: AsRef<[u8]> + AsMut<[u8]>,
{
#[inline]
pub fn get_bit(&self, index: usize) -> bool {
debug_assert!(index / 8 < self.storage.as_ref().len());
let byte_index = index / 8;
let byte = self.storage.as_ref()[byte_index];
let bit_index = if cfg!(target_endian = "big") {
7 - (index % 8)
} else {
index % 8
};
let mask = 1 << bit_index;
byte & mask == mask
}
#[inline]
pub fn set_bit(&mut self, index: usize, val: bool) {
debug_assert!(index / 8 < self.storage.as_ref().len());
let byte_index = index / 8;
let byte = &mut self.storage.as_mut()[byte_index];
let bit_index = if cfg!(target_endian = "big") {
7 - (index % 8)
} else {
index % 8
};
let mask = 1 << bit_index;
if val {
*byte |= mask;
} else {
*byte &= !mask;
}
}
#[inline]
pub fn get(&self, bit_offset: usize, bit_width: u8) -> u64 {
debug_assert!(bit_width <= 64);
debug_assert!(bit_offset / 8 < self.storage.as_ref().len());
debug_assert!((bit_offset + (bit_width as usize)) / 8 <= self.storage.as_ref().len());
let mut val = 0;
for i in 0..(bit_width as usize) {
if self.get_bit(i + bit_offset) {
let index = if cfg!(target_endian = "big") {
bit_width as usize - 1 - i
} else {
i
};
val |= 1 << index;
}
}
val
}
#[inline]
pub fn set(&mut self, bit_offset: usize, bit_width: u8, val: u64) {
debug_assert!(bit_width <= 64);
debug_assert!(bit_offset / 8 < self.storage.as_ref().len());
debug_assert!((bit_offset + (bit_width as usize)) / 8 <= self.storage.as_ref().len());
for i in 0..(bit_width as usize) {
let mask = 1 << i;
let val_bit_is_set = val & mask == mask;
let index = if cfg!(target_endian = "big") {
bit_width as usize - 1 - i
} else {
i
};
self.set_bit(index + bit_offset, val_bit_is_set);
}
}
}
#[repr(C)]
#[derive(
Debug, Copy, Clone, zerocopy :: AsBytes, zerocopy :: FromBytes, zerocopy :: FromZeroes,
)]
pub struct SS {
pub s: SS__bindgen_ty_1,
}
#[repr(C, packed)]
#[derive(
Debug, Copy, Clone, zerocopy :: AsBytes, zerocopy :: FromBytes, zerocopy :: FromZeroes,
)]
pub struct SS__bindgen_ty_1 {
pub _bitfield_align_1: [u8; 0],
pub _bitfield_1: __BindgenBitfieldUnit<[u8; 4usize]>,
}
#[test]
fn bindgen_test_layout_SS__bindgen_ty_1() {
assert_eq!(
::std::mem::size_of::<SS__bindgen_ty_1>(),
4usize,
concat!("Size of: ", stringify!(SS__bindgen_ty_1))
);
assert_eq!(
::std::mem::align_of::<SS__bindgen_ty_1>(),
1usize,
concat!("Alignment of ", stringify!(SS__bindgen_ty_1))
);
}
impl SS__bindgen_ty_1 {
#[inline]
pub fn x(&self) -> u32 {
unsafe { ::std::mem::transmute(self._bitfield_1.get(0usize, 28u8) as u32) }
}
#[inline]
pub fn set_x(&mut self, val: u32) {
unsafe {
let val: u32 = ::std::mem::transmute(val);
self._bitfield_1.set(0usize, 28u8, val as u64)
}
}
#[inline]
pub fn y(&self) -> u32 {
unsafe { ::std::mem::transmute(self._bitfield_1.get(28usize, 4u8) as u32) }
}
#[inline]
pub fn set_y(&mut self, val: u32) {
unsafe {
let val: u32 = ::std::mem::transmute(val);
self._bitfield_1.set(28usize, 4u8, val as u64)
}
}
#[inline]
pub fn new_bitfield_1(x: u32, y: u32) -> __BindgenBitfieldUnit<[u8; 4usize]> {
let mut __bindgen_bitfield_unit: __BindgenBitfieldUnit<[u8; 4usize]> = Default::default();
__bindgen_bitfield_unit.set(0usize, 28u8, {
let x: u32 = unsafe { ::std::mem::transmute(x) };
x as u64
});
__bindgen_bitfield_unit.set(28usize, 4u8, {
let y: u32 = unsafe { ::std::mem::transmute(y) };
y as u64
});
__bindgen_bitfield_unit
}
}
#[test]
fn bindgen_test_layout_SS() {
const UNINIT: ::std::mem::MaybeUninit<SS> = ::std::mem::MaybeUninit::uninit();
let ptr = UNINIT.as_ptr();
assert_eq!(
::std::mem::size_of::<SS>(),
4usize,
concat!("Size of: ", stringify!(SS))
);
assert_eq!(
::std::mem::align_of::<SS>(),
1usize,
concat!("Alignment of ", stringify!(SS))
);
assert_eq!(
unsafe { ::std::ptr::addr_of!((*ptr).s) as usize - ptr as usize },
0usize,
concat!("Offset of field: ", stringify!(SS), "::", stringify!(s))
);
}
My questions are:
- Is it reasonable to add support for a
repr(transparent)
generated bitfield type? - Is it reasonable to add a CLI support for custom derive macro for generated bitfield type?