bitfield-struct-rs icon indicating copy to clipboard operation
bitfield-struct-rs copied to clipboard

Support backing via `[u8; N]`

Open daprilik opened this issue 2 years ago • 5 comments

First off: thanks for this crate! The syntax is clean and intuitive, and is a lot easier to use than the other bitfield crates i've seen on crates.io.

Right now, the generated bitfield can only be backed by various integer types, which has 3 pretty major limitations:

  1. It limits the size of the bitfield to at most sizeof(u128)
  2. It forces the addition of additional padding bytes into types that don't necessarily need them (e.g: a bitfield composed of 3xu8 fields)
  3. It enforces alignment on the type, making it difficult to naturally use as part of repr(packed) fields (or repr(C) fields + something like the zerocopy crate)

A clean solution to overcoming all 3 of these limitations would be to support backing the generated bitfield using a [u8; N] array.

Happy to brainstorm specifics of syntax / featureset, but I was thinking it could look something like:

#[bitfield] // no explicit backing type specified
struct Foo {
    foo: u64,
    bar: u64,
    #[bits(2)]
    extra: u8,
    #[bits(6)]
    _reserved: u8 // must include explicit "padding" bits
}

static_assert_eq!(std::mem::size_of::<Foo>(), 8 + 8 + 1);

// no `From`/`Into` for any specific backing integer, but instead have a From/Into `[u8; N]`

Cheers!

daprilik avatar Jan 31 '23 20:01 daprilik

Using [u8; N] as the internal type is a good idea. However, this comes also with additional difficulties.

  • We still might have to be aware of alignment (but that can be defaulted to 1 and changed by #[repr(align($x)]).
  • Also, implementing Copy for larger types might be a bad idea. So we have to decide if it shouldn't be implemented automatically at all or only for smaller types (up to 128bit or so).
  • How do we handle endianness? Do we default to the architecture and provide an optional argument to specify it?
  • And lastly, this complicates the generation of the accessor functions, which now have to work with an array of bytes instead of a single integer type. Especially if values span over multiple bytes, this might become somewhat tedious.

Despite these points, it's the better and more flexible solution going forward. So if you have good ideas, I'd be happy to hear them.

PS: Up to now, I only had the use case of creating bitfields sized and aligned like the integer types (primarily for x86 register abstractions or packed data to be used inside atomics). So I've chosen this simpler-to-implement design...

PS2: Isn't the size of Foo 17 (8 + 8 + 1)?

wrenger avatar Feb 01 '23 20:02 wrenger

  • We still might have to be aware of alignment (but that can be defaulted to 1 and changed by #[repr(align($x)]).

Ack. That seems reasonable to me (in the case you're not specifying an explicit backing type).

  • Also, implementing Copy for larger types might be a bad idea. So we have to decide if it shouldn't be implemented automatically at all or only for smaller types (up to 128bit or so).

Maybe it's best to just leave that up to the user, whether they want to make it #[derive(Copy)]? I'm personally not a huge fan of these sorts of "implicit up to a certain size N" APIs, but as long as it's documented, either way is fine I suppose...

  • How do we handle endianness? Do we default to the architecture and provide an optional argument to specify it?

I suppose the same could be asked of the current implementation? I could be wrong, but doesn't the current impl already assume native endian ordering when it comes to defining accessors to fields?

This is a reasonable question to ask, but also, something that I don't think is strictly required in a hypothetical "v1" implementation of this feature.

Though, if we're spitballing: I could imagine specifying it as an additional param to #[bitfield], something like #[bitfield(endian = big)], with a default of the native endian.

  • And lastly, this complicates the generation of the accessor functions, which now have to work with an array of bytes instead of a single integer type. Especially if values span over multiple bytes, this might become somewhat tedious.

Well, yes, it might be a bit more complicated since you might have fields that span across multiple bytes, but you should be able to write the code in such a way that each accessor function uses the same generated code "template", with the only difference being the specific "bounds" of the inner array the code accesses, along with two "start" and "end" bounds corresponding to the byte index in the left-most and right-most byte of that subslice.

If need be, we could collab on the specifics here, but I don't think it'll be too hard.

PS: Up to now, I only had the use case of creating bitfields sized and aligned like the integer types (primarily for x86 register abstractions or packed data to be used inside atomics). So I've chosen this simpler-to-implement design...

Totally reasonable! The fun with open source is how folks end up using your lib in ways you didn't even forsee, and, well, here we are :)

PS2: Isn't the size of Foo 17 (8 + 8 + 1)?

Haha, yes, math is hard, oops 😅

daprilik avatar Feb 01 '23 21:02 daprilik

Ok, good points. So 1B alignment, no Copy, native endianness (in v1).

If need be, we could collab on the specifics here, but I don't think it'll be too hard.

Thanks, I might come back to that 😅 In the next few days, I'll experiment a little bit with this idea.

Well, yes, it might be a bit more complicated since you might have fields that span across multiple bytes, but you should be able to write the code in such a way that each accessor function uses the same generated code "template", with the only difference being the specific "bounds" of the inner array the code accesses, along with two "start" and "end" bounds corresponding to the byte index in the left-most and right-most byte of that subslice.

To reduce duplicate code, other proc_macro libraries consist of two crates, one for generating the code and another with functions called by the generated code. This might also be useful if we want to have our own traits/types...

Then we could implement a function like:

fn set_bits<const N: usize, const M: usize>(
     dst: &mut [u8; N], 
     dst_offset: usize, 
     src: &[u8; M], 
     src_offset: usize, 
     len: usize);

And in the generated code, we just use it. This would also simplify future optimizations (larger and fewer instructions for large+aligned dst/src, simd...).


Oh, and I forgot one thing: custom types and enums.

Currently, we support types that are convertible to the bitfield's integer type. This would have to change to From<[u8; N]> + Into<[u8; N]> with either the same N as the bitfield or an extra way of specifying a <=N value.

wrenger avatar Feb 01 '23 23:02 wrenger

Yeah, sounds good! Thanks for looking into this!

Oh, and I forgot one thing: custom types and enums.

Currently, we support types that are convertible to the bitfield's integer type. This would have to change to From<[u8; N]> + Into<[u8; N]> with either the same N as the bitfield or an extra way of specifying a <=N value.

As a suggestion: why not make the required conversion function dependent on the field size, rather than the size of the backing bitfield-struct?

Even in the current implementation, it does seem a bit weird that in order to use a custom type with, say, a #[bits(4)] field (part of a larger #[bitfield(u64)]), the type has a From/Into bound on u64, rather than the "next biggest" integer type (i.e: u8), mirroring the getter/setter.

daprilik avatar Feb 01 '23 23:02 daprilik

I've created pr #7 for this.

wrenger avatar Feb 03 '23 19:02 wrenger