https://github.com/firecracker-microvm/firecracker/pull/3104 must be merged first

Reason for This PR

Working with the large number of values smaller than u8 required for CPUID quickly becomes untenable. With thousands of constants denoting offsets.

bitflags does not support fields which are required for CPUID. To use bitflags here would require high hundreds (possibly 1000+) of additional loosely associated constants to reference these fields within the bitflag structures. This would form a large and difficult to read implementation.

Description of Changes

Introduces bit-fields and bit-fields-macros crates to support handling these values. These function as one crate[^one] inacting a similar role to bitflags except with additional support for fields within bytes used for unsigned integer values.

[^one]: A crates cannot export both procedural-macros and non-procedural-macros, so it requires 2 crates acting as 1, like serde and serde-derive

[ ] This functionality can be added in rust-vmm.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

PR Checklist

[x] All commits in this PR are signed (git commit -s).
[x] The issue which led to this PR has a clear conclusion.
[x] This PR follows the solution outlined in the related issue.
[x] The description of changes is clear and encompassing.
[x] Any required documentation changes (code and docs) are included in this PR.
[x] Any newly added unsafe code is properly documented.
[x] Any API changes follow the Runbook for Firecracker API changes.
[x] Any user-facing changes are mentioned in CHANGELOG.md.
[ ] All added/changed functionality is tested.

Aug 24 '22 18:08 JonathanWoollett-Light

This PR needs a better motivation on why these new crates are needed. We already use the bitflags crate. We should outline why this is different and what differences does this bring.

I've added a short justification.

It also seems that the commitmsg is empty. We should add these details in the commit message as well.

Added commit message.

Since there are 2 separate crates being added, we can split the commit in 2 and have 2 commits, each one for each crate.

Added a point about this, that these should be used as 1 crate.

Sep 05 '22 18:09 JonathanWoollett-Light

The current failures in CI are a result of the currently not explicitly allowed license Unicode-DSF-2016 being present in a transitive dependency. I am looking into this.

Sep 06 '22 09:09 JonathanWoollett-Light

The breaking up of features for the crate given the size of the PR does seem to be over-engineering the crate quite a bit. If it is a crate within the Firecracker workspace, I would expect the entire feature set to be required to be built into the binary. It falls beyond the scope of requirements to write a generalized bit-fields library, and does make the PR harder to vet for the requirement of supporting CPUID operations. (at least for a first pass)

I would note that these features where initially implemented for development (https://github.com/firecracker-microvm/firecracker/pull/3105#discussion_r967035038). They are useful for isolating functionality in debugging and development.

They also act as documentation like a comment // This relates to X and offer a degree of customization in usability.

Sep 13 '22 09:09 JonathanWoollett-Light

The only major thing that's missing is the unit tests for the procedural macro. I guess that in some parts the code can be modularised even further but it think this can come from writing the unit tests.

Could you give an example of what sort of test you are looking for here?

At the moment when running grcov on src/bit_fields there is >80% coverage.

Sep 16 '22 10:09 JonathanWoollett-Light

Due to outstanding issues with kcov, I have converted this back to a draft.

Oct 19 '22 09:10 JonathanWoollett-Light

StructParser (which could also be renamed to BitFieldParser maybe) which does the initial parsing of the structure and ends up with 3 main parts:

struct name

struct type

fields

    // Divide structure into 3 main parts:
    // name, data type and fields.
    // However, we also keep around the representation of the structure type.
    let bf_parser = match StructParser::new(item) {
        Ok(bf) => bf,
        Err(err) => {
            abort_call_site!("Cannot parse bit field input: {}", err.to_string());
        }
    };

Unfortunately in doing this we degrade the usability of the macro. A lot of errors now use Span::call_site() rather than the specific Spans for the problematic token/s.

I have played around with trying to implement changes like you suggest but I haven't been able to implement something satisfactory. I tend to find keeping the macro linear with as little indirection as possible offers the best readability and overall minimal code. I can continue to look at this but would appreciate any further suggestions.

Oct 20 '22 10:10 JonathanWoollett-Light

To summary what we just talked about off-line, I was imagining a very simple top-down parser like this (in a mix of rust and pseudocode since I am not familiar with the exact APIs of the proc_macro crate). I think this also goes in the direction @dianpopa envisioned.

struct Parser {
    input: Iter<Token>
    look_ahead: Option<Token>
}

impl Parser {
  /// Verifies that the next token is of the specified type, and returns it if it is
  ///  You'll potentially want variants such as `next_ident` that return a string instead of a generic token.
  #[must_use]
  fn next(&mut self, expected: TokenType) -> Result<Token, ParserError> {
    let to_return = match self.look_ahead.take() {
      Some(token) if token.type() == expected => token,
      Some(_) => return Err(UnexpectedToken),
      None => return Err(EndOfInput)
    };
    self.look_ahead = self.input.next();
    Ok(to_return);
  }

  /// Verifies that the next token is of the specified type, but discards it (useful if just matching symbols for instance)
  fn expect(&mut self, expected: TokenType) -> Result<(), ParserError> {
    self.next().map(|_|())
  }

  fn peek(&self) -> Option<&Token> {
    self.look_ahead.as_ref()
  }

  fn new(stream: TokenStream) -> Self {
    let mut input = stream.into_iter();
    let look_ahead = iter.next();
    Parser { input, look_ahead }
  }

  fn parse() -> Result<...> {
    let struct_name = self.next_ident()?;
    let mut fields = Vec::new();

    // match an arrow (=>)
    self.expect(TokenType::EqualSign)?;
    self.expect(TokenType::ClosingAngleBracket)?;

    while self.peek().ok()?.ttype != TokenType::ClosingCurlyBracket {
        // parse each field, put into vec, etc.
    }
  
    Ok(BitFieldRepr {struct_name, fields})
  }
}

this should be very easy to test and is also readable and extensible. Additionally, you should get good error reporting out of it. Potentially you'll have to actually do some recursive decent if you need to attach errors to whole token trees instead of individual tokens, but this should not be a problem.

From you BitFieldRepr you can then have methods that use quote! to construct bite-sized pieces of our output (e.g. each struct and impl on its own), and you can just stitch those together in the actual #[proc_macro] function.

Oct 25 '22 10:10 roypat

I am surprised and a little concerned that there aren't existing libraries that allow bit/bit-field handling. If we are convinced that there are no publically available alternatives to an inhouse bit-fields library, then given how generalized this crate is, we should strongly consider pulling it outside of the Firecracker workspace to avoid rebuilding the bit-fields crate (when changes to it directly should be rare).

I just gave crates.io a quick search and found this library that seems quite popular (>400k download) and offers very similar functionality. Is there a reason we do not use it?

Oct 26 '22 08:10 roypat

I am surprised and a little concerned that there aren't existing libraries that allow bit/bit-field handling. If we are convinced that there are no publically available alternatives to an inhouse bit-fields library, then given how generalized this crate is, we should strongly consider pulling it outside of the Firecracker workspace to avoid rebuilding the bit-fields crate (when changes to it directly should be rare).

I just gave crates.io a quick search and found this library that seems quite popular (>400k download) and offers very similar functionality. Is there a reason we do not use it?

The primary problem here is it does not efficiently support exotically sized bit ranges, in CPUID some values might be 3 bits or 13 bits etc.

Other issues are:

Small implementations: Given https://docs.rs/modular-bitfield/0.11.2/modular_bitfield/#generated-implementations much of the functionality we would need is missing and implementing a safe wrapper here would be difficult.
Unmaintained: 2 years since last commit or reply to issue or PR. Multiple PRs and issues open without any comment.

Verbose: It requires more lines to write the same things:

    bit_fields::bitfield!(ExampleBitField,u32,{
     /// RANGE1 bit field
     RANGE1: 0..1,
     /// SSE bit flag
     SSE: 2,
     /// SSE1 bit flag
     SSE1: 4,
     /// RANGE2 bit field
     RANGE2: 5..7,
 });

becomes:

 #[bitfield(bits=32)]
 pub struct MyFourBytes {
     /// RANGE1 bit field
     RANGE1: /* and even more to specify this type */,
     /// SSE bit flag
     SSE: bool,
     #[skip] __: B1
     /// SSE1 bit flag
     SSE1: bool,
     #[skip] __: B1
     /// RANGE2 bit field
     RANGE2: /* and even more to specify this type */,
 }

this adds 100s of lines to CPUID as many of the bit fields are sparsely populated requiring a lot of #[skip] __: B1.

Oct 26 '22 09:10 JonathanWoollett-Light

I am surprised and a little concerned that there aren't existing libraries that allow bit/bit-field handling. If we are convinced that there are no publically available alternatives to an inhouse bit-fields library, then given how generalized this crate is, we should strongly consider pulling it outside of the Firecracker workspace to avoid rebuilding the bit-fields crate (when changes to it directly should be rare).

All dependencies including local crates get rebuilt the same, So it shouldn't have any affect on compile time if we moved it to its own repository. I think moving this outside the Firecracker repo would add extra work and I don't see any specific gain (if we chose to we could still publish it as crate in the workspace). I tend to be against adding additional repositories as it complicates the org.

Oct 26 '22 09:10 JonathanWoollett-Light

I am surprised and a little concerned that there aren't existing libraries that allow bit/bit-field handling. If we are convinced that there are no publically available alternatives to an inhouse bit-fields library, then given how generalized this crate is, we should strongly consider pulling it outside of the Firecracker workspace to avoid rebuilding the bit-fields crate (when changes to it directly should be rare).

All dependencies including local crates get rebuilt the same, So it shouldn't have any affect on compile time if we moved it to its own repository. I think moving this outside the Firecracker repo would add extra work and I don't see any specific gain (if we chose to we could still publish it as crate in the workspace). I tend to be against adding additional repositories as it complicates the org.

I'm not sure org complexity is really a concern. For example crates like micro-http and versionize are removed and I would not expect them to get rebuilt each time I build Firecracker, and this is an appropriate separation in my opinion. Conversely, any changes or feature enhancements to Bitfields would need to go through Firecracker's CI as well, but the crate's functional responsibilities are general and fall outside of the hypervisor world in general.

I realize this set of changes has come far down the line already in scope of this PR, so I'm not keen to push on radical changes opposing the PR. Principally though, having Firecracker's workspace bloat with generic code needing compilation isn't ideal.

Oct 26 '22 11:10 mattschlebusch

There are some more areas where I think the amount of generated code can be reduced by clever trait impls. If we had a bit field trait as follows

pub trait BitField {
  type Base;

  fn field(&self) -> &Self::Base;
  fn field_mut(&self) -> &mut Self::Base;
}

then we could do impl<T: BitField<Base=u32>> BitIndex<u32, N> for T {} for N = 0..31, instead of generating those 31 impls every time a bit field is defined. If we introduce a MASK constant to this trait (as above), a bunch of other methods the proc macro generates atm could become generic impls.

Nov 18 '22 17:11 roypat

I also finally had a look at whether proc-macros are necessary to implement this functionality, or whether traditional macros suffice, and the answer is that proc-macros are indeed necessary. Traditional macros break down when trying to append suffices to idents. This is needed to generate the *_mut bit accessors. However, there exists a third-part proc-macro, paste, specifically for this use case. So if we do indeed manage to extract a lot of the generated code into generic trait impls, I would favor an implementation such as the following (prototyped one) over the current proc-macro

macro_rules! bitfield2 {
    (bitfield $name: ident: $base: ty {
        $(
            $(#[$attr: meta])*
            $field_name: ident: $start: tt$(..$end: tt)?
        ),*
    }) => {
        pub struct $name(pub $base);

        impl $name {
            $(
                generate_accessors!($($attr)*, $name, $base, $field_name, $start$(..$end)?);
            )*
        }
    };
}

macro_rules! generate_accessors {
    ($($attr: meta$)*, $bitfield_name: ident, $base_type: ty, $field_name: ident, $bit: tt) => {
        paste::paste! {
            $(#[$attr])*
            pub fn $field_name(&self) -> bit_fields::Bit<u32, $bit> {
                bit_fields::Bit(&self.0)
            }

            $(#[$attr])*
            pub fn [< $field_name _mut >] (&mut self) -> bit_fields::BitMut<u32, 2u8> {
                bit_fields::BitMut(&mut self.0)
            }
        }
    };

    ($($attr: meta$)*, $bitfield_name: ident, $base_type: ty, $field_name: ident, $start: tt..$end: tt) => {
        paste::paste! {
            $(#[$attr])*
            pub fn $field_name(&self) -> bit_fields::BitRange<u32, $start, $end> {
                bit_fields::BitRange(&self.0)
            }

            $(#[$attr])*
            pub fn [< $field_name _mut >](&mut self) -> bit_fields::BitRangeMut<u32, $start, $end> {
                bit_fields::BitRangeMut(&mut self.0)
            }
        }
    }
}

bitfield2! {
    bitfield ExampleBitField2: u32 {
        // RANGE1 Bit Field
        RANGE1: 0..1,

        // SSE Bit flag
        SSE: 2
    }
}

Nov 18 '22 17:11 roypat

There are some more areas where I think the amount of generated code can be reduced by clever trait impls I

There are some more areas where I think the amount of generated code can be reduced by clever trait impls. If we had a bit field trait as follows
pub trait BitField {
  type Base;

  fn field(&self) -> &Self::Base;
  fn field_mut(&self) -> &mut Self::Base;
}
then we could do impl<T: BitField<Base=u32>> BitIndex<u32, N> for T {} for N = 0..31, instead of generating those 31 impls every time a bit field is defined. If we introduce a MASK constant to this trait (as above), a bunch of other methods the proc macro generates atm could become generic impls.

See https://github.com/firecracker-microvm/firecracker/pull/3105#discussion_r1027677654

Nov 21 '22 08:11 JonathanWoollett-Light

I think we should move this functionality all in one crate because it is weird for bit-fields-macros to generate code that calls into bit-fields when the crate dependency is actually the other way around. I am talkign about this: https://github.com/firecracker-microvm/firecracker/pull/3105/files#diff-db7421964bc6c93ef265a89cff62f1c7ea9e95748ee3c90755b492ed516878bbR142.

This functionality cannot be in 1 crate, crates which export proc-macros cannot export any other items.

Dec 29 '22 09:12 JonathanWoollett-Light

I think we should move this functionality all in one crate because it is weird for bit-fields-macros to generate code that calls into bit-fields when the crate dependency is actually the other way around. I am talkign about this: https://github.com/firecracker-microvm/firecracker/pull/3105/files#diff-db7421964bc6c93ef265a89cff62f1c7ea9e95748ee3c90755b492ed516878bbR142.

This functionality cannot be in 1 crate, crates which export proc-macros cannot export any other items.

I saw crosvm has all in 1 crate, maybe it s worth taking a look since they also used proc-macros.

Dec 29 '22 12:12 dianpopa

I think we should move this functionality all in one crate because it is weird for bit-fields-macros to generate code that calls into bit-fields when the crate dependency is actually the other way around. I am talkign about this: https://github.com/firecracker-microvm/firecracker/pull/3105/files#diff-db7421964bc6c93ef265a89cff62f1c7ea9e95748ee3c90755b492ed516878bbR142.

This functionality cannot be in 1 crate, crates which export proc-macros cannot export any other items.

I saw crosvm has all in 1 crate, maybe it s worth taking a look since they also used proc-macros.

They are using 2 nested crates, the 1st is at https://github.com/dgreid/crosvm/tree/main/bit_field and the 2nd is at https://github.com/dgreid/crosvm/tree/main/bit_field/bit_field_derive. We could also do this, it shouldn't make any difference, I think its just a question of style.

Dec 29 '22 12:12 JonathanWoollett-Light

firecracker
firecracker copied to clipboard

CPUID (2/3): Bit fields

https://github.com/firecracker-microvm/firecracker/pull/3104 must be merged first

Reason for This PR

Description of Changes

License Acceptance

PR Checklist

firecracker firecracker copied to clipboard

CPUID (2/3): Bit fields

https://github.com/firecracker-microvm/firecracker/pull/3104 must be merged first

Reason for This PR

Description of Changes

License Acceptance

PR Checklist

firecracker
firecracker copied to clipboard