Add Stable Div Rem
closes #11
You may notice that I included a compiler hint for in some of the divide by zero checks. e.g.:
https://github.com/NCGThompson/ethnum-rs/blob/295dd6d0a0743026c3ca593e750b0268b65e4357/src/uint.rs#L372-L386
Inequality operators (e.g. <=, >) are overloaded with the cmp method, which has a custom implementation based on the lexicographical comparison of (u128, u128):
https://github.com/NCGThompson/ethnum-rs/blob/295dd6d0a0743026c3ca593e750b0268b65e4357/src/uint/cmp.rs#L18-L20
The eq method, on the other hand, is derived automatically. It performs the eq method on each of U256's fields, which is a single array of u128s.
In LLVM, the lexicographical comparison is translated to two icmps, while the eq method is translated to a bcmp. The bcmp is very fast, and can take advantage of the SIMD registers. When compared to a constant 0 on x86_64 with 256-bit registers, bcmp translates to test ymmm, ymmm. icmp is not optimized the same way.
However, the compiler seems to inconsistently elide redundant bcmps, even if they are compiled in the same module. For example,
if a == 0 {
unsafe { unreachable_unchecked() }
}
if a == 0 {
panic!();
}
compiles to a single bcmp with all optimizations on. Also, the compiler doesn't understand that the result of the bcmp indicates the result of the icmp of each of the u128s. Notably, the / expressions used in udivmod4 use icmp, so
if a > u128::MAX {
unsafe { unreachable_unchecked() }
}
if a == 0 {
unsafe { unreachable_unchecked() }
}
if a == 0 {
panic!();
}
_ = U256::new(3) / a;
compiles to a 256-bit bcmp and a 128-bit icmp each with their own panic statement, even if udivmod4 is inlined and optimized.
So, you may ask, why not use a custom implementation of eq that includes the compiler hint? I definitely think that is worth considering, but the effect varies based on the context as well as the opt settings. I don't know if it will be a pessimization in some cases. Interestingly, if lto is not set to "fat", then it makes the biggest difference when comparing a 256-bit to a 128-bit and the hint is placed directly in the self.eq(128) rather than a custom self.eq(Self).
Fixed the alignment. The other change I made was to the docs and doc tests of some methods. One could argue it was better before, but this way makes it more consistent with std.
I found an error. In the std library, overflow is a guaranteed panic even without overflow checks. I made this depend on debug assertions.