nonmax
nonmax copied to clipboard
Implement 8 and optionally 16 bit integers with build.rs generated enum
This PR starts off by splitting the fundamental structure and functions into two different macros, to allow for a custom definition of the internals without having to rewrite all the helpers and trait implementations.
I then changed NonMaxU8
, NonMaxI8
(and optionally NonMaxU16
/NonMaxI16
) to be custom definitions, using a custom enum with NonMax::MAX
variants to allow us to choose where the niche is stored, preventing having to generate xor
instructions for new
and get
, as tested and validated on godbolt.
16 bit enum repr is limited to a feature flag, as it extends build time by ~12 seconds on a 5800x3d, and is the reason that the new
method is implemented as a transmute
(using a match extends build times to minutes!)
~~I'm going to rewrite to use build.rs, to avoid bloating the codebase and make the codegen logic live in the repo instead of the Python script I used to generate the repr enums.~~ Done!
Hey, thanks for the PR! I've been travelling with bad internet and haven't had time until now to sit down and look at this.
This is a super interesting idea! It's neat that we can cut down on some instructions at runtime by pushing the work to build time. Controlling the representation of a niche by representing numbers as an enum is clever.
I am concerned about increasing the complexity of this crate's implementation. The build time increase for the 16-bit variants is a lot and I don't know if the special case for 8-bit integers is worthwhile on its own. I wonder if there is a compiler feature we could help with that would make defining custom niches easier. Conversely, I wonder if there are any edge cases we might hit in the compiler by having enums with so many variants.
What is the runtime performance difference of this change? Do you have an application where this change makes a difference on your runtime performance?
I don't have a situation in which this measurably improves runtime performance, I'm just really interested in overall software micro-optimization and this is an interesting efficiency improvement. This would make nonmax
a truly pure-gain abstraction, at least for 8 bit and 16 bit, instead of a trade off of niches/validity and the extra CPU instruction per access or store.
We could use the std internal feature that allows for NonZero
to exist, but that would lock it behind nightly. That could be implemented in a different PR under a nightly
feature, but for now I want to make this improvement as open to everyone.
This definitely increases complexity in the nonmax repo, but it's mostly self-contained and not a very complex project to start with. This is already running into the compiler's inability to handle large enums with the 16 bit feature, but it causes no errors at this size, just a slow build time that I may look into improving if I get into working on the compiler soon.
I've run into a bit of an issue, this causes MSRV to be bumped to 1.57 since this is relying on const transmute
(1.56) and const unreachable_unchecked
(1.57). Shall I include the MSRV bump in this PR, open it separately, or do something else?
I've just managed to reduce build times from 12 seconds to around 7 to 8 seconds, I'm currently running rustc profiling tools to fix the compiler's handling of this, hopefully the feature should become redundant soon.
@LPGhatguy can we get this merged?