rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Add bitfields support

Open Andy-Python-Programmer opened this issue 3 years ago • 23 comments

This RFC adds support for bit-fields in repr(C) structs by

  • Introducing a new attribute bits(N) that can be applied to integer fields.
  • Allowing such annotated fields to be unnamed.

Example:

use core::mem;

#[repr(C)]
struct HbaPrdtEntry {
    data_base_upper: u32,
    data_base_address_upper: u32,
    reserved: u32,

    #[bits(22)] byte_count: u32,
    #[bits(9)] reserved_2: u32,
    #[bits(1)] interrupt_on_completion: u32,
}

assert_eq!(mem::size_of::<HbaPrdtEntry>(), 16);

This pull is reopens the pull #3064 as mahkoh is no longer participating in the Rust community.

Issue: #314

Rendered preview

Andy-Python-Programmer avatar Apr 22 '21 08:04 Andy-Python-Programmer

Let's also take a look at other designs from other languages, like Ada, etc.

leonardo-m avatar Apr 23 '21 16:04 leonardo-m

I think we should instead do something like the following:

#[repr(C)]
struct MyStruct {
    #[repr(bitfield(u32))]
    a: uint<5>,
    #[repr(bitfield(u32))]
    b: uint<3>,
    #[repr(bitfield(new, u32))]
    c: uint<12>,
    #[repr(bitfield(u16, new))]
    d: u16,
    e: bool,
    #[repr(bitfield(bool))]
    f: bool,
    #[repr(bitfield(bool))]
    g: bool,
}

which is equivalent to the C struct:

struct MyStruct {
    uint32_t a:5;
    uint32_t b:3;
    uint32_t :0, c:12;
    uint16_t d:16, :0;
    _Bool e, f:1, g:1;
}

This representation gives the advantage that Rust fields have the actual type (e.g. uint<3> instead of weird u32) that can be stored in the struct, instead copying the C mis-step of having a fake type that you can't actually use all the normal values of (e.g. a 5-bit int32_t bitfield can only store -16 through 15, instead of the full expected range of 32-bit values).

I'd expect Rust to get generic integers (uint<N>/int<N>) (for at least <= 128 bits) because bitfields are far from the only place where arbitrary generic integers are quite useful, they are also needed for representing bit-masks for SIMD.

programmerjake avatar Apr 23 '21 22:04 programmerjake

"needed" is perhaps an overstatement of the SIMD situation.

As to the RFC:

  • "bit-field fields" is all kinds of poor, i want that bike shed to be some other color.
  • why are we allowing bool fields of more than one bit? We should just disallow that and block off a potential source of UB.
  • since we're adding a new struct style we should just make it be the case that if the struct declaration can't be converted to C it's a compile error.
  • I think many of the future possibility bullets should be resolved in the RFC because their answer affects if we should take any of the alternative paths.

Lokathor avatar Apr 23 '21 22:04 Lokathor

Bitfields are interesting on their own, even without honoring the C layout. They can be used to pack information more densely: for instance, the bitflags crate replacing a struct of bools. At the moment, there is no robust way to pack small enums inside the same byte. I would suggest:

  • allowing C-like enums in bitfields when the enum's range fits the allocated bits, with no bounds checks ;
  • allowing non-repr(C) bitfields, for Rust-optimized layouts ;
  • allowing bitfields in tuple structs ;
  • allowing bitfields in an enum variant's fields.

cjgillot avatar Apr 25 '21 19:04 cjgillot

If you don't need to honor the C layout rules you can do the same effect as bitfields very simply with just a macro_rules or two, and you get a lot more control over it than any set of language rules that would have to fit all situations for all people across the entire language.

So I honestly don't think we need non-repr-C bitfields.

Lokathor avatar Apr 25 '21 20:04 Lokathor

allowing C-like enums in bitfields when the enum's range fits the allocated bits, with no bounds checks

I would say this is more likely for a non-safe language to say. If we actually add support for this then we will need to add syntax like unsafe enum/struct.

allowing bitfields in tuple structs

What about transparent tuple structs? We could restrict transparent structs to non-bitfield structs to make this feature happen.

Andy-Python-Programmer avatar Apr 26 '21 07:04 Andy-Python-Programmer

I would say this is more likely for a non-safe language to say. If we actually add support for this then we will need to add syntax like unsafe enum/struct.

This is a fairly simple static verification step. The compiler can trivially determine if a given enum has all possible bit patterns of a given bit mask inhabited, and then either allow without bounds check or compile error.

allowing bitfields in tuple structs

Why would this be treated any differently at all from curly brace structs?

Lokathor avatar Apr 26 '21 16:04 Lokathor

For some prior art, I've always loved how C# does this:

[StructLayout(LayoutKind.Explicit, Size = 4)]
struct Foo {
  [FieldOffset(0)]
  public byte bar;

  [FieldOffset(0)]
  public int baz;
}

To guess how we might adapt that to Rust:

#[repr(explicit(size = 4))]
struct Foo {
  #[repr(offset = 0)]
  bar: u8,
 
  #[repr(offset = 0)]
  baz: u32,
}

#[repr(explicit(size = 1))]
struct Flags {
  #[repr(offset = 0, size = 1)]
  one: bool,

  #[repr(offset = 1, size = 1)]
  two: bool,

  #[repr(offset = 5, size = 1)]
  three: bool,
}

To summarize,

Add offset and size to the repr attribute and allow using this on fields when the struct is annotated with repr(explicit).


I do think bit-n integers would fit nicely here but I think that's orthogonal and shouldn't be part of this RFC.

mehcode avatar Apr 26 '21 21:04 mehcode

Should the prior art section also cover the bitfield crate? It is quite nice in my experience. I think it would probably also be good for the RFC to explain why it needs to be in the language vs in a crate.

anp avatar Apr 29 '21 15:04 anp

I think the bitfield crate, and the fact that you can even make your own alternative if you don't like how it handles things, is proof enough that we don't need language support for repr(rust) bitfields.

However, lang support for repr(C) bitfields brings in a high level of confidence that the compiler will correctly match the layout of the local C ABI when compiling for a target. So to me that's the valuable thing to focus on here.

Lokathor avatar Apr 29 '21 15:04 Lokathor

@Lokathor

Macro-generated field emulation (via explicit getter/setter methods) is clunky in comparison to an actual field. Nothing in Stable and Nightly will fix this: there is no DerefMove or DerefSet to simulate properties; there is no replacement for struct construction syntax and patterns. Property-like const fns would be a great addition to the language, for sure.

For layout concerns repr(C) (matching local C ABI) may be desired for FFI, but when interacting with hardware, file formats, etc. a repr(Stable) as in "what you write is what you get", is more valuable. But the latter is not covered by this RFC.


C bitfields are a nightmare. They're tacky and platform dependent. This RFC shouldn't spend time making them ultra-ergonomic to write. Instead, I think this RFC should call out that:

  • This requires compiler support due to how the ABI varies across compilers/calling conventions and architecture.
  • The final size, offset, etc. of the bitfields are an implementation detail matching the local C ABI. Transmuting a bitfield to an integer type is generally undefined behavior, except if properly done on a specific architecture. Some ABIs may add padding at the bit-level, the location of the padding is part of the ABI.
  • Syntactically, the design of C ABI bitfields will not match the theoretical "ergonomic, layout stable" bitfield. Nor will it impact their design. These ideas should be kept separate.
  • Syntax should not fall far from C, the intention is C interop. More verbose syntax prevents hand translation of C bitfields.

ds84182 avatar May 11 '21 00:05 ds84182

C bitfields are a nightmare

Well thats normal as its C xD. Here's list of issues the linux kernel experienced from bitfields betrayed by GCC https://lwn.net/Articles/478657/. We do not want that to happen in rust world :D thats why this ultra-ergonomic write is useful.

Andy-Python-Programmer avatar Jun 07 '21 01:06 Andy-Python-Programmer

Macro-generated field emulation (via explicit getter/setter methods) is clunky in comparison to an actual field. Nothing in Stable and Nightly will fix this: there is no DerefMove or DerefSet to simulate properties; there is no replacement for struct construction syntax and patterns. Property-like const fns would be a great addition to the language, for sure.

Why is this need best solved by this instead of DerefMove or DerefSet, then?

workingjubilee avatar Jun 30 '21 03:06 workingjubilee

I have been evaluating the landscape of C compilers and have become much more familiar with the standard. The C23 standard is going to land without enormous improvements to the handling of bitfields per se. There will be some improvements to ability to specify some of various sizes, which will give improved programmer control in new declarations if accepted, it will not change existing bitfields in the world of C code. That means that often, bitfields will remain, essentially, implementation defined.

So it is important for this RFC to reflect on what, exactly, it means to be "C-compatible", when "C" does not have one definition, nor even 5 (C89, C99, C11, C17, and C23), but one for every single C compiler and for every single target, combinatorically. Often, repr(C) is used to merely enforce stabilized field layout, but adding bitfields could imply wildly different layouts based on which compiler it was compatible with.

Even when we factor in such things as processor-specific ABIs, often the layout of bitfields is ambiguous at best. That creates a unique problem when this much is left open for implementation definition:

The implementer can change their mind.

And that could undermine the kind of stability guarantees that programmers expect from Rust. In the past, vendor actions have significantly impacted Rust's platform support. However, because they affected OS-specific interfaces, or whether a target existed, or something wrapped in abstraction barriers, this hasn't mattered much to the core of the Rust language.

But this RFC, if accepted, could make those changes cut much deeper. The details of how repr(C) works in the presence of a given struct field impacts Rust's language semantics directly regarding memory layout, in ways that can affect programs and programmers.

In a future where Standard C does specify, exactly, how bitfields should be handled, even just enough that we could believe that compilers would at least reliably come to similar conclusions, this RFC would seem useful. Until then, it seems more appropriate to address the C bitfields problem by providing tools that make it easier to solve this in libraries.

workingjubilee avatar Feb 18 '23 07:02 workingjubilee

repr(C) structs in Rust currently serve a dual role: interop with C on the one hand, and a more strictly-/ well-defined layout (for unsafe code to rely on) on the other hand.

Given that context, I think it’s important that features can be (and are) explained in a way that doesn’t require deep familiarity with other programming languages / with C.

I know some basics of C, but I don’t know anything about “bit-fields” in particular. I am deeply familiar with Rust. With this background, this RFC ready very weird for me, basically I understand almost nothing from the RFC text alone as long as I haven’t read through the complete “reference-level explanation” in detail yet.

But IMO, it certainly shouldn’t be the case that someone deeply familiar with Rust will understand nothing at all from the motivation and guide-level explanations of a RFC alone.

Here’s the limited information that I personally got as an understanding / takeaway from those sections, so you know what you can improve upon.


From the motivation:

The C language has something called “bit-fields”. The Linux kernel uses them. They are hard to understand/calculate/whatever. They have a peculiar syntax using a colon that I’ve never seen before, and they have surprising/weird platform-dependent effects that I cannot even begin to understand without the slightest hint of where this is coming from.

So far this reads like a horribly confusing feature that I wouldn’t want to have in Rust at all if there’s any chance to avoid it and get the necessary interop in a different way. If you want this motivation to give off a different vibe than “wtf is this weirdness I don’t understand it and don’t want it”, then perhaps the motivation should not only motivate “bitfields exist and are hard” but also give some indication why they’re a useful (and thus usef) feature of the C language, what kind of feature they are, the most basic intuition what a “bit-field” even is.

From the guide-level explanation:

The RFC proposes some attribute-based syntax that’s supposed to be an equivalent to the C syntax. As to what the syntax means, I shall better learn some C, I guess?

From skimming through the reference-level explanation:

There’s syntax of course, good, I can skip that, since the fact that there’s a new syntax is the only thing that I did understand in the RFC.

Theres some nomenclature and restrictions... alright, restrictions don’t give me much in terms of what a bit-field is in the first place.

Writing this reply as I’m reading more of that section... finally, this is the first time I come across the most crucial piece of information. This should be among the first sentences of the RFC, but instead it’s well hidden in the middle the “reference-level” section.

Each field annotated with bits(N) occupies N bits of storage.

On that note, the reference-level explanation should probably get some structure that separates the different sections about syntax, restrictions of what types and values can be used, semantics of interoperating with the fields, layout, and possibly more.


Finally, a single note from me about the contents, not the presentation, and I suppose this has been mentioned in the discussion above already, too. I’m perplexed by the premise that

The language reference shall document for each target the layout of structs containing bit-fields.

The intended behavior is that the layout is the same layout as produced by the C system compiler of the target when compiling the corresponding C struct for the same target.

As I mentioned in the beginning of this post, the dual-role of repr(C) in Rust makes a layout that will be heavily dependent on ... well ... as many factors that I cannot even quote them off the top of my head after reading this RFC ... seem quite unfitting, and different from the rest of repr(C) layout. Reading this RFC, this kind of “bitfield” feature would be intended to be used exclusively for C-interop; so it’s comparable to?… well…, probably vararg functions, or extern "C" functions. The difference however is that it proposes a whole language feature, that does seem possibly useful on its own, whereas (as far as I’m aware) varargs are highly unsafe (and despite being supported by Rust FFI, you cannot actually use the feature within Rust very well) and extern "C" is really only an FFI thing.

The fact that the bitfields feature introduces a full language feature that's usable without unsafe and possibly quite useful on its own – outside of FFI considerations – means that making available only the “weirdly platform-dependent C-compatible” way of doing layout seems surprisingly restrictive to me.

steffahn avatar Feb 18 '23 08:02 steffahn

The current bitfield situation is a potential blocker to using Rust in embedded applications at some organizations cough my employer cough as it makes writing safe, standard, error-free code difficult.

Nearly all C compilers (GCC, Clang, IAR, etc.) support marking bitfields as "Packed" via a #pragma. This tells the compiler to not use the otherwise arcane C rules for formatting bit fields and to not add platform dependent padding. This is used in a large amount of networking code because it means we can trade computation (packing isn't free - more instructions) for denser structures to send over a wire or over the air.

Obviously, we can write macros or functions and a pile of setters and getters and bit math headaches to accomplish this anyway - it's all just bytes at the end of the day. But saying we only need the repr(C) variant to allow for C's mildly cursed padding rules for compatibility isn't right either. So long as the original C is using one of the packing #pragmas, the obvious, not-just-for-compatibility easy to use layout that a Rust oriented new syntax would allow would make translating C code much easier too.

As a bonus, supporting only the not-platform-dependent use case would make it easier for something like bindgen to make something more idiomatic. See https://rust-lang.github.io/rust-bindgen/using-bitfields.html - if you actually run this you'll see it makes quite the mess. - EDIT: It's also already a platform-dependent mess, as bindgen needs to know the target get the padding right: https://rust-lang.github.io/rust-bindgen/faq.html#how-to-generate-bindings-for-a-custom-target

As is, every option leaves a bit to be desired.

  • Not using any crates means harder to write, easier to make mistakes in bit-math code
  • Using any crate means adding a dependency
    • Deku which looks to me to be the nicest bitfield crate requires at least alloc - a potential non-starter for embedded.
    • Modular Bitfield, which appears to be the most popular option has a variety of problems that this blog post outlines nicely.

So while I'm in support of taking time to get this right, saying "We can leave it up to crates" doesn't seem good enough.

VegaDeftwing avatar Jan 29 '24 17:01 VegaDeftwing

  • Deku which looks to me to be the nicest bitfield crate requires at least alloc - a potential non-starter for embedded.

from looking at the cargo features and the lib.rs, it looks like it might work without alloc, just don't enable the alloc feature

(edit: nevermind, it documents alloc being required on no_std, imo it should have just always used alloc then and not had that feature gate)

programmerjake avatar Jan 29 '24 18:01 programmerjake

I am distinctly not an experienced Rust dev, but if I had to throw my hat in the ring to recommend syntax, it would probably be something like this:

#[endian(little)]
struct Foo {
    a : bool,
    b : u7,
    #[endian(big)]
    d : u24,
    e : [i16; 4],
    f : [u24; 3],
}

Making this dependent on RFC: Generic integers #2581

Where I think things get a bit gross with this is generics and enums. It might be the case that this makes it hard to verify something is actually %8 bits in size, which should probably be enforced (though that's a tradeoff with composability of structs). Some way to specify how many bits an enum should take would be logical, along with a reasonable solution for dealing with being OOB of that enum. There's also the fun case of how to handle bools - making a 1-bit field that's not a bool feels gross even in C. 🤷‍♂️

VegaDeftwing avatar Jan 29 '24 18:01 VegaDeftwing

The current bitfield situation is a potential blocker to using Rust in embedded applications at some organizations ... as it makes writing safe, standard, error-free code difficult.

As the owner of the gba crate, which is a crate for an embedded device, I'm very suspicious of this claim. I've never had a problem with using integer newtypes and bit-field manipulation methods, either in the creation of the type itself (which can be done quite readably with a macro_rules macro) or in using the type (which ends up reading like normal "builder pattern" code, extremely common in Rust). Perhaps I've got an advantage because the MMIO values don't need to be packed into larger structs with target padding, but even so this seems like a fairly simple thing to handle "properly" once and then never think about again. I even made the bitfrob crate so that all the different bit math things I'd need to do have clear names. I generally do agree that needing a dependency is generally worse than having something built into the language or available in core, but I wouldn't call using a dependency a blocker to adopting Rust.

Lokathor avatar Jan 29 '24 21:01 Lokathor

because the MMIO values

For I/O I think it's not unreasonable to do it via bit shifts and what not. When you've got 100+ different packet types for shooting over a network each of different sizes (which may be many, many bytes large) needing to think to construct and destruct them can get quite tedious, and I don't know of a better way to handle it than C bitfields. Again, I'm far from a Rust pro, but when a not insignificant amount of the application logic is processing and handling these packets, it needs to be ergonomic to do work with their data.

VegaDeftwing avatar Jan 29 '24 21:01 VegaDeftwing

If you make accessor methods for each bitpacked "field" you want to simulate, then there's a pretty clear conversion that's easy to remember:

access field syntax accessor method
read data.field data.field()
write data.field = new data.set_field(new)

Since bitpacked values aren't really held inside other bitpacked values, this simple rule is enough to handle almost any situation. Even if the overall struct for a situation contains two different bitpacked values, just treat each bitpacked value individually and the problem generally remains manageable.

Lokathor avatar Jan 29 '24 22:01 Lokathor

Nearly all C compilers (GCC, Clang, IAR, etc.) support marking bitfields as "Packed" via a #pragma. This tells the compiler to not use the otherwise arcane C rules for formatting bit fields and to not add platform dependent padding.

This replaces the "arcane C rules for formatting bit fields" with entirely compiler-specific, non-standard formatting, with no compatibility guarantees. Some platform ABIs go so far as to note that this is hypothetically possible, but it should never be exposed in a public header, ever, and that any such code that does so is nonconforming... right after noting an implementation-defined difference in generated layouts between two C compilers when you do this.

So I disagree with your conclusion:

So while I'm in support of taking time to get this right, saying "We can leave it up to crates" doesn't seem good enough.

...because at least if you use a crate, you have an actual guarantee that you have the same thing on both ends of the wire, as both compilers have to compile Rust correctly. This is the same, basically, as using a C library that does bit-munging exclusively with uint8_t or (uint8_t*, size_t) pairs: the C compiler may not compile such optimally, but at least it will not "miscompile" such because the compiler implemented a compiler-level pragma differently. Your proposed C-level solution requires validating the bit-level layout actually chosen by each compiler... at which point, the amount of validation you are doing means you prooobably could have worked with char* and come out with the same code.

Anyways, if deku using alloc is bad, consider using its dependencies like bitvec more directly (specifically, BitSlice). It is very common in no_std crates to use a simple allocator that allows use of alloc, however.

workingjubilee avatar Jan 30 '24 03:01 workingjubilee

...Now, aside from the note that I really hope you aren't trading data between any copies of armcc and gcc over the wire...

@VegaDeftwing In general, because Rust crates have access to procedural macros, which allow for writing significant syntax extension for the language, when we say "we should let crates handle this", it does not necessarily mean modifying the language is inconceivable. It means that it's currently believed that a library can provide a better API, even a better syntax, without having to PR their changes to the compiler, which allows them to iterate independently.

This is not true for all libraries, as not all code can be generated by simply having rustc dlopen() a library and run the TokenStream through the dlopened library. The work on core::simd, for instance, is unfeasible by such a means, as whether the abstract SIMD code optimizes well is part of the question. And e.g. generic integer support would greatly assist writing such a library to begin with. However, given the problems with reference-to-packed-field soundness, the Cell-like API that Lokathor describes is already de-facto mandatory, and no one's particularly arguing against generic integers, just against tying them to the implementation of bitfields.

If your corporation needs a better bit-munging library than currently exists, an obvious route suggests itself: contracting a Rust pro for such and worrying about whether the library is suitable for PRing to rustc later.

workingjubilee avatar Jan 30 '24 04:01 workingjubilee