WideIntProofOfConcept
WideIntProofOfConcept copied to clipboard
Trying to get perfect codegen for big-integer math.
Wide Integer Math (Proof of Concept)
This header provides a class template Wider<T>, such that Wider<uint64_t>
acts like an unsigned 128-bit type, Wider<Wider<uint64_t>> acts like an unsigned
256-bit type, and so on. Wider<T> overloads all the arithmetic operators,
and also provides a free function countleadingzeros(w).
A sufficiently smart compiler should produce the same codegen for Wider<uint64_t>
as it does for the built-in primitive type __uint128_t. Any significant differences
in codegen (in either direction) should probably be filed as "missed optimization"
bugs against the relevant compiler.
As of April 2020, Clang supports
a built-in type unsigned _ExtInt(n) that implements n-bit arithmetic for any n at all, even
non-powers of two. A sufficiently smart compiler should produce the same codegen for Wider<Wider<...>>
as it does for unsigned _ExtInt(n).
The following tables show naïve instruction counts (not counting labels, but counting the ret)
for each member function of wider_tests::Tests<W>. The letter "P" indicates codegen
that appears perfect as far as I know. The word "call" indicates failure to completely
inline the code (usually because it's so big).
"__udivti3" and "__umodti3" indicate that certain arithmetic operations on __uint128_t
are delegated to a library function. The library function is macro-optimized with
many special cases (division by small integers, division by integers with many trailing zeros,
et cetera). In contrast, Wider's operator/ omits these special cases,
resulting in smaller code but almost certainly slower code in common situations.
All numbers are produced by Godbolt Compiler Explorer, using -O3 -std=c++14, on GCC trunk and Clang trunk.
128-bit math using __uint128_t, unsigned _BitInt(128), and Wider<uint64_t>:
| Test name | Clang uint128 |
Clang _BitInt |
Clang W<u64> |
GCC uint128 |
GCC W<u64> |
|---|---|---|---|---|---|
| preinc | 3 P | 3 P | 3 P | 3 P | 6 |
| postinc | 11 | 11 | 11 | 11 | 12 |
| predec | 3 P | 3 P | 3 P | 3 P | 6 |
| postdec | 11 | 11 | 11 | 11 | 13 |
| plus | 5 P | 5 P | 5 P | 5 P | 11 |
| pluseq | 5 P | 5 P | 5 P | 5 P | 6 |
| minus | 5 P | 5 P | 5 P | 5 P | 14 |
| minuseq | 5 P | 5 P | 5 P | 5 P | 9 |
| mul | 11 P | 11 P | 11 P | 11 P | 11 P |
| muleq | 11 P | 11 P | 11 P | 11 P | 11 P |
| div | __udivti3 | __udivti3 | 47 | __udivti3 | 57 |
| diveq | __udivti3 | __udivti3 | 47 | __udivti3 | 57 |
| mod | __umodti3 | __umodti3 | 38 | __umodti3 | 53 |
| modeq | __umodti3 | __umodti3 | 38 | __umodti3 | 53 |
| xor_ | 5 P | 5 P | 5 P | 5 P | 5 P |
| xoreq | 5 P | 5 P | 5 P | 5 P | 5 P |
| and_ | 5 P | 5 P | 5 P | 5 P | 5 P |
| andeq | 5 P | 5 P | 5 P | 5 P | 5 P |
| or_ | 5 P | 5 P | 5 P | 5 P | 5 P |
| oreq | 5 P | 5 P | 5 P | 5 P | 5 P |
| shl | 13 | 13 | 12 P | 12 P | 24 |
| shleq | 13 | 13 | 12 P | 12 P | 24 |
| shr | 13 | 13 | 12 P | 12 P | 23 |
| shreq | 13 | 13 | 12 P | 12 P | 23 |
| clz | 7 | 1 | 7 | 13 | 11 |
| lt | 6 P | 6 P | 6 P | 6 P | 9 |
| leq | 6 P | 6 P | 6 P | 7 | 9 |
| gt | 6 P | 6 P | 6 P | 7 | 9 |
| geq | 6 P | 6 P | 6 P | 6 P | 9 |
| eq | 6 P | 7 | 7 | 7 | 7 |
| neq | 6 P | 7 | 7 | 7 | 7 |
| not_ | 4 P | 4 P | 4 P | 4 P | 4 P |
| bool_ | 4 P | 4 P | 4 P | 4 P | 4 P |
| neg | 5 | 5 | 5 | 4 P | 13 |
| flip | 3 P | 3 P | 5 | 3 P | 5 |
256-bit math using unsigned _ExtInt(256) and Wider<Wider<uint64_t>>:
| Test name | Clang 13 _ExtInt |
Clang W<W<u64>> |
GCC W<W<u64>> |
|---|---|---|---|
| preinc | 5 P | 5 P | 11 |
| postinc | 23 | 13 | 18 |
| predec | 5 P | 5 P | 15 |
| postdec | 23 | 13 | 20 |
| plus | 9 P | 9 P | 20 |
| pluseq | 9 P | 9 P | 13 |
| minus | 9 P | 9 P | 24 |
| minuseq | 9 P | 9 P | 17 |
| mul | 60 | 63 | 64 |
| muleq | 62 | 63 | 64 |
| div | 1 | 173 call | 216 call |
| diveq | 1 | 173 call | 216 call |
| mod | 1 | 173 call | 198 |
| modeq | 1 | 166 call | 198 |
| xor_ | 9 P | 9 P | 9 P |
| xoreq | 9 P | 9 P | 9 P |
| and_ | 9 P | 9 P | 9 P |
| andeq | 9 P | 9 P | 9 P |
| or_ | 9 P | 9 P | 9 P |
| oreq | 9 P | 9 P | 9 P |
| shl | 64 | 30 | 60 |
| shleq | 64 | 30 | 60 |
| shr | 61 | 33 | 64 |
| shreq | 61 | 33 | 64 |
| clz | 1 | 17 | 29 |
| lt | 10 P | 10 P | 15 |
| leq | 10 P | 10 P | 15 |
| gt | 10 P | 10 P | 15 |
| geq | 10 P | 10 P | 15 |
| eq | 13 P | 13 P | 13 P |
| neq | 13 P | 13 P | 13 P |
| not_ | 7 | 9 | 6 P |
| bool_ | 7 | 9 | 6 P |
| neg | 11 | 11 | 21 |
| flip | 5 P | 8 | 8 |
512-bit math using unsigned _ExtInt(512) and Wider<Wider<Wider<uint64_t>>>:
| Test name | Clang 13 _ExtInt |
Clang W<W<W<u64>>> |
GCC W<W<W<u64>>> |
|---|---|---|---|
| preinc | 9 | 9 | 23 |
| postinc | 45 | 25 | 35 |
| predec | 9 | 9 | 27 |
| postdec | 45 | 25 | 38 |
| plus | 17 | 17 | 42 |
| pluseq | 17 | 17 | 25 |
| minus | 17 | 17 | 46 |
| minuseq | 17 | 17 | 31 |
| mul | 274 | 300 call | 258 |
| muleq | 275 | 300 call | 258 |
| div | 1 | 467 call | 512 call |
| diveq | 1 | 467 call | 512 call |
| mod | 1 | 467 call | 473 |
| modeq | 1 | 456 call | 479 call |
| xor_ | 17 | 17 | 17 |
| xoreq | 17 | 17 | 17 |
| and_ | 17 | 17 | 17 |
| andeq | 17 | 17 | 17 |
| or_ | 17 | 17 | 17 |
| oreq | 17 | 17 | 17 |
| shl | 347 | 133 call | 137 |
| shleq | 347 | 133 call | 137 |
| shr | 362 | 79 | 168 |
| shreq | 362 | 79 | 168 |
| clz | 1 | 39 | 71 |
| lt | 18 | 18 | 28 |
| leq | 18 | 18 | 29 |
| gt | 18 | 18 | 29 |
| geq | 18 | 18 | 28 |
| eq | 25 | 21 | 26 |
| neq | 25 | 21 | 26 |
| not_ | 13 | 13 | 10 |
| bool_ | 13 | 13 | 10 |
| neg | 23 | 23 | 37 |
| flip | 9 | 14 | 14 |