nonmax icon indicating copy to clipboard operation
nonmax copied to clipboard

Implement 8 and optionally 16 bit integers with build.rs generated enum

Open GnomedDev opened this issue 7 months ago • 6 comments

This PR starts off by splitting the fundamental structure and functions into two different macros, to allow for a custom definition of the internals without having to rewrite all the helpers and trait implementations.

I then changed NonMaxU8, NonMaxI8 (and optionally NonMaxU16/NonMaxI16) to be custom definitions, using a custom enum with NonMax::MAX variants to allow us to choose where the niche is stored, preventing having to generate xor instructions for new and get, as tested and validated on godbolt.

16 bit enum repr is limited to a feature flag, as it extends build time by ~12 seconds on a 5800x3d, and is the reason that the new method is implemented as a transmute (using a match extends build times to minutes!)

GnomedDev avatar Jan 05 '24 12:01 GnomedDev

~~I'm going to rewrite to use build.rs, to avoid bloating the codebase and make the codegen logic live in the repo instead of the Python script I used to generate the repr enums.~~ Done!

GnomedDev avatar Jan 05 '24 13:01 GnomedDev

Hey, thanks for the PR! I've been travelling with bad internet and haven't had time until now to sit down and look at this.

This is a super interesting idea! It's neat that we can cut down on some instructions at runtime by pushing the work to build time. Controlling the representation of a niche by representing numbers as an enum is clever.

I am concerned about increasing the complexity of this crate's implementation. The build time increase for the 16-bit variants is a lot and I don't know if the special case for 8-bit integers is worthwhile on its own. I wonder if there is a compiler feature we could help with that would make defining custom niches easier. Conversely, I wonder if there are any edge cases we might hit in the compiler by having enums with so many variants.

What is the runtime performance difference of this change? Do you have an application where this change makes a difference on your runtime performance?

LPGhatguy avatar Jan 18 '24 18:01 LPGhatguy

I don't have a situation in which this measurably improves runtime performance, I'm just really interested in overall software micro-optimization and this is an interesting efficiency improvement. This would make nonmax a truly pure-gain abstraction, at least for 8 bit and 16 bit, instead of a trade off of niches/validity and the extra CPU instruction per access or store.

We could use the std internal feature that allows for NonZero to exist, but that would lock it behind nightly. That could be implemented in a different PR under a nightly feature, but for now I want to make this improvement as open to everyone.

This definitely increases complexity in the nonmax repo, but it's mostly self-contained and not a very complex project to start with. This is already running into the compiler's inability to handle large enums with the 16 bit feature, but it causes no errors at this size, just a slow build time that I may look into improving if I get into working on the compiler soon.

GnomedDev avatar Jan 18 '24 18:01 GnomedDev

I've run into a bit of an issue, this causes MSRV to be bumped to 1.57 since this is relying on const transmute (1.56) and const unreachable_unchecked (1.57). Shall I include the MSRV bump in this PR, open it separately, or do something else?

GnomedDev avatar Jan 18 '24 20:01 GnomedDev

I've just managed to reduce build times from 12 seconds to around 7 to 8 seconds, I'm currently running rustc profiling tools to fix the compiler's handling of this, hopefully the feature should become redundant soon.

GnomedDev avatar Jan 18 '24 23:01 GnomedDev

@LPGhatguy can we get this merged?

GnomedDev avatar Mar 14 '24 17:03 GnomedDev