intx streamline/speed up addc/subc by using in/out carry parameter

Apr 02 '22 22:04 greg7mdp
I believe this version is significantly faster!
Apr 03 '22 23:04 greg7mdp
I benchmarked GCC 11. In general we would need to split these into smaller changes (I can do this myself) for better analysis. Some quick takeaways:
comparison optimization looks promising. If we can show it is always faster than subtraction this can land.
multiplication looks faster,
the basic add/sub got slower, I plan to inspect the assembly to see what happen.
We should also consult Clang. It can optimize add/sub perfectly so as long it stays this way we can go ahead for changes making it easier for GCC.
clz<uint32_t, intx::clz>_mean                                        -0.0031         -0.0031             0             0             0             0                                                                              
clz<uint32_t, intx::clz_generic>_mean                                +0.0005         +0.0005             1             1             1             1                                                                              
clz<uint64_t, intx::clz>_mean                                        +0.3459         +0.3459             0             1             0             1                                                                              
clz<uint64_t, intx::clz_generic>_mean                                +0.0008         +0.0008             3             3             3             3                                                                              
div_normalize<internal::normalize>_mean                              -0.0001         -0.0001            11            11            11            11                                                                              
reciprocal<uint64_t, neg>_mean                                       +0.0014         +0.0014             0             0             0             0                                                                              
reciprocal<uint64_t, reciprocal_naive>_mean                          +0.0012         +0.0012            19            19            19            19                                                                              
reciprocal<uint64_t, reciprocal_2by1>_mean                           -0.0002         -0.0002             4             4             4             4                                                                              
reciprocal<uint64_t, reciprocal_2by1_noinline>_mean                  +0.0010         +0.0010             4             4             4             4                                                                              
reciprocal<uint128, reciprocal_3by2>_mean                            +0.0012         +0.0012             7             7             7             7                                                                              
reciprocal<uint128, reciprocal_3by2_noinline>_mean                   +0.0072         +0.0072             6             6             6             6                                                                              
udiv64<nop>_mean                                                     +0.0018         +0.0018             1             1             1             1                                                                              
udiv64<udiv_by_reciprocal>_mean                                      +0.0015         +0.0015             7             7             7             7                                                                              
udiv64<udiv_native>_mean                                             -0.0002         -0.0002             7             7             7             7                                                                              
udiv64<soft_div_unr>_mean                                            +0.0137         +0.0137             7             7             7             7                                                                              
udiv64<soft_div_unr_unrolled>_mean                                   +0.0190         +0.0190             8             8             8             8                                                                              
udiv128<gcc_>/0_mean                                                 -0.0005         -0.0005            25            25            25            25                                                                              
udiv128<gcc_>/1_mean                                                 -0.0017         -0.0017            24            24            24            24                                                                              
udiv128<gcc_>/2_mean                                                 -0.0007         -0.0007            29            29            29            29                                                                              
udiv128<gcc_>/3_mean                                                 +0.0027         +0.0027            21            21            21            21                                                                              
udiv128<intx_>/0_mean                                                +0.0047         +0.0047            16            16            16            16                                                                              
udiv128<intx_>/1_mean                                                +0.0035         +0.0035            16            16            16            16                                                                              
udiv128<intx_>/2_mean                                                +0.0023         +0.0023            13            13            13            13                                                                              
udiv128<intx_>/3_mean                                                -0.0006         -0.0006            13            13            13            13                                                                              
udiv128<gmp_>/0_mean                                                 +0.0010         +0.0010            24            24            24            24                                                                              
udiv128<gmp_>/1_mean                                                 +0.0046         +0.0045            25            25            25            25                                                                              
udiv128<gmp_>/2_mean                                                 -0.0007         -0.0007            18            18            18            18                                                                              
udiv128<gmp_>/3_mean                                                 -0.0004         -0.0004            14            14            14            14                                                                              
umul128<intx::uint128, intx::umul>_mean                              -0.0021         -0.0022             0             0             0             0                                                                              
div<uint256, udivrem>/64_mean                                        +0.0221         +0.0221            22            23            22            23                                                                              
div<uint256, udivrem>/128_mean                                       -0.0029         -0.0029            28            28            28            28                                                                              
div<uint256, udivrem>/192_mean                                       -0.0097         -0.0097            33            33            33            33                                                                              
div<uint256, udivrem>/256_mean                                       -0.0031         -0.0031            29            29            29            29                                                                              
div<uint256, gmp::udivrem>/64_mean                                   +0.0238         +0.0238            24            24            24            24                                                                              
div<uint256, gmp::udivrem>/128_mean                                  +0.0119         +0.0119            31            32            31            32                                                                              
div<uint256, gmp::udivrem>/192_mean                                  +0.0037         +0.0037            52            52            52            52                                                                              
div<uint256, gmp::udivrem>/256_mean                                  +0.0109         +0.0109            52            52            52            52                                                                              
div<uint512, udivrem>/64_mean                                        +0.0008         +0.0008            37            37            37            37                                                                              
div<uint512, udivrem>/128_mean                                       -0.0010         -0.0010            47            47            47            47                                                                              
div<uint512, udivrem>/192_mean                                       -0.0039         -0.0039            67            67            67            67                                                                              
div<uint512, udivrem>/256_mean                                       +0.0019         +0.0019            65            65            65            65                                                                              
div<uint512, gmp::udivrem>/64_mean                                   +0.0206         +0.0206            38            38            38            38                                                                              
div<uint512, gmp::udivrem>/128_mean                                  -0.0005         -0.0005            54            54            54            54                                                                              
div<uint512, gmp::udivrem>/192_mean                                  -0.0024         -0.0024            78            78            78            78                                                                              
div<uint512, gmp::udivrem>/256_mean                                  -0.0073         -0.0073            82            81            82            81                                                                              
mod<addmod>/64_mean                                                  -0.0037         -0.0037            29            29            29            29                                                                              
mod<addmod>/128_mean                                                 -0.0069         -0.0069            32            32            32            32                                                                              
mod<addmod>/192_mean                                                 -0.0064         -0.0064            39            38            39            38                                                                              
mod<addmod>/256_mean                                                 -0.0048         -0.0047            35            35            35            35                                                                              
mod<addmod_public>/64_mean                                           -0.0019         -0.0019            32            32            32            32                                                                              
mod<addmod_public>/128_mean                                          +0.0039         +0.0038            34            34            34            34                                                                              
mod<addmod_public>/192_mean                                          +0.0016         +0.0016            40            40            40            40
mod<addmod_public>/256_mean                                          +0.0032         +0.0032            35            35            35            35
mod<addmod_simple>/64_mean                                           +0.0078         +0.0078            31            31            31            31
mod<addmod_simple>/128_mean                                          +0.0012         +0.0012            33            33            33            33
mod<addmod_simple>/192_mean                                          -0.0048         -0.0048            39            39            39            39
mod<addmod_simple>/256_mean                                          +0.0075         +0.0075            34            34            34            34
mod<addmod_prenormalize>/64_mean                                     -0.0509         -0.0509            60            57            60            57
mod<addmod_prenormalize>/128_mean                                    -0.0022         -0.0022            63            63            63            63
mod<addmod_prenormalize>/192_mean                                    -0.0193         -0.0193            74            73            74            73
mod<addmod_prenormalize>/256_mean                                    -0.0155         -0.0155            57            56            57            56
mod<addmod_daosvik_v1>/64_mean                                       -0.0062         -0.0062            32            31            32            31
mod<addmod_daosvik_v1>/128_mean                                      +0.0032         +0.0032            33            33            33            33
mod<addmod_daosvik_v1>/192_mean                                      -0.0040         -0.0040            39            39            39            39
mod<addmod_daosvik_v1>/256_mean                                      +0.0014         +0.0014            34            34            34            34
mod<addmod_daosvik_v2>/64_mean                                       +0.0077         +0.0077            31            32            31            32
mod<addmod_daosvik_v2>/128_mean                                      -0.0024         -0.0024            33            33            33            33
mod<addmod_daosvik_v2>/192_mean                                      +0.0014         +0.0014            39            39            39            39
mod<addmod_daosvik_v2>/256_mean                                      -0.0003         -0.0003            34            34            34            34
mod<mulmod>/64_mean                                                  -0.0283         -0.0283            51            50            51            50
mod<mulmod>/128_mean                                                 -0.0324         -0.0324            61            59            61            59
mod<mulmod>/192_mean                                                 -0.0067         -0.0067            79            78            79            78
mod<mulmod>/256_mean                                                 -0.0123         -0.0123            78            77            78            77
ecmod<addmod_public>_mean                                            +0.2047         +0.2047             9            11             9            11
ecmod<addmod_simple>_mean                                            +0.0066         +0.0066            25            25            25            25
ecmod<addmod_prenormalize>_mean                                      +0.3527         +0.3527             8            10             8            10
ecmod<addmod_daosvik_v1>_mean                                        -0.3898         -0.3899            33            20            33            20
ecmod<addmod_daosvik_v2>_mean                                        +0.3402         +0.3402             8            11             8            11
ecmod<mulmod>_mean                                                   -0.0154         -0.0154            75            73            75            73
binop<uint256, uint256, add>_mean                                    +1.2626         +1.2626             2             5             2             5
binop<uint256, uint256, inline_add>_mean                             +0.0012         +0.0012             2             2             2             2
binop<uint256, uint256, sub>_mean                                    +1.0722         +1.0722             3             5             3             5
binop<uint256, uint256, inline_sub>_mean                             +0.0018         +0.0018             2             2             2             2
binop<uint256, uint256, public_mul>_mean                             +0.0194         +0.0194             6             6             6             6
binop<uint256, uint256, gmp::mul>_mean                               -0.0003         -0.0003            14            14            14            14
binop<uint512, uint256, umul_>_mean                                  -0.1252         -0.1252            14            12            14            12
binop<uint512, uint256, gmp::mul_full>_mean                          +0.0012         +0.0012            15            15            15            15
binop<uint512, uint512, add>_mean                                    +0.4870         +0.4869             4             6             4             6
binop<uint512, uint512, inline_add>_mean                             +0.0043         +0.0043             4             4             4             4
binop<uint512, uint512, sub>_mean                                    +0.4274         +0.4274             4             6             4             6
binop<uint512, uint512, inline_sub>_mean                             +0.0037         +0.0037             4             4             4             4
binop<uint512, uint512, public_mul>_mean                             -0.1096         -0.1096            35            31            35            31
binop<uint512, uint512, gmp::mul>_mean                               -0.0054         -0.0054            52            52            52            52
shift<uint256, uint256, shl_public>/-1_mean                          +0.0120         +0.0120             3             3             3             3
shift<uint256, uint256, shl_public>/0_mean                           -0.0199         -0.0199             4             4             4             4
shift<uint256, uint256, shl_public>/1_mean                           -0.0525         -0.0525             4             4             4             4
shift<uint256, uint256, shl_public>/2_mean                           +0.0003         +0.0002             3             3             3             3
shift<uint256, uint256, shl_public>/3_mean                           +0.0689         +0.0689             2             2             2             2
shift<uint256, uint256, shl_halves>/-1_mean                          -0.0404         -0.0404             3             3             3             3
shift<uint256, uint256, shl_halves>/0_mean                           -0.0309         -0.0309             4             4             4             4
shift<uint256, uint256, shl_halves>/1_mean                           -0.0002         -0.0002             4             4             4             4
shift<uint256, uint256, shl_halves>/2_mean                           -0.1520         -0.1520             3             3             3             3
shift<uint256, uint256, shl_halves>/3_mean                           -0.0950         -0.0950             2             2             2             2
shift<uint256, uint64_t, shl_public>/-1_mean                         -0.0402         -0.0402             3             3             3             3
shift<uint256, uint64_t, shl_public>/0_mean                          -0.1448         -0.1448             4             4             4             4
shift<uint256, uint64_t, shl_public>/1_mean                          -0.1506         -0.1506             4             4             4             4
shift<uint256, uint64_t, shl_public>/2_mean                          -0.0001         -0.0001             3             3             3             3
shift<uint256, uint64_t, shl_public>/3_mean                          -0.0084         -0.0084             2             2             2             2
shift<uint256, uint64_t, shl_halves>/-1_mean                         +0.0003         +0.0003             3             3             3             3
shift<uint256, uint64_t, shl_halves>/0_mean                          -0.0059         -0.0059             4             4             4             4
shift<uint256, uint64_t, shl_halves>/1_mean                          -0.0348         -0.0348             4             4             4             4
shift<uint256, uint64_t, shl_halves>/2_mean                          -0.0160         -0.0160             3             3             3             3
shift<uint256, uint64_t, shl_halves>/3_mean                          +0.0018         +0.0018             2             2             2             2
shift<uint512, uint512, shl_public>/-1_mean                          -0.0194         -0.0194            10            10            10            10
shift<uint512, uint512, shl_public>/0_mean                           -0.0185         -0.0185            10            10            10            10
shift<uint512, uint512, shl_public>/1_mean                           -0.0192         -0.0192            10            10            10            10
shift<uint512, uint512, shl_public>/2_mean                           -0.0183         -0.0183            10            10            10            10
shift<uint512, uint512, shl_public>/3_mean                           -0.0184         -0.0184             9             9             9             9
shift<uint512, uint64_t, shl_public>/-1_mean                         -0.0019         -0.0019             9             9             9             9
shift<uint512, uint64_t, shl_public>/0_mean                          -0.0034         -0.0034             9             9             9             9
shift<uint512, uint64_t, shl_public>/1_mean                          -0.0007         -0.0007             9             9             9             9
shift<uint512, uint64_t, shl_public>/2_mean                          +0.0009         +0.0009             9             9             9             9
shift<uint512, uint64_t, shl_public>/3_mean                          +0.0006         +0.0006             8             8             8             8
compare<lt_public>/64_mean                                           +0.0058         +0.0058             2             2             2             2
compare<lt_public>/128_mean                                          -0.1167         -0.1167             2             2             2             2
compare<lt_public>/192_mean                                          -0.0361         -0.0361             2             2             2             2
compare<lt_public>/256_mean                                          -0.2078         -0.2078             2             1             2             1
compare<lt_sub>/64_mean                                              -0.0066         -0.0066             2             2             2             2
compare<lt_sub>/128_mean                                             -0.0063         -0.0063             2             2             2             2
compare<lt_sub>/192_mean                                             -0.0059         -0.0059             2             2             2             2
compare<lt_sub>/256_mean                                             -0.0066         -0.0066             2             2             2             2
compare<lt_wordcmp>/64_mean                                          -0.0029         -0.0029             2             2             2             2
compare<lt_wordcmp>/128_mean                                         +0.2282         +0.2282             4             5             4             5
compare<lt_wordcmp>/192_mean                                         +0.4004         +0.4004             4             5             4             5
compare<lt_wordcmp>/256_mean                                         +0.4286         +0.4286             4             5             4             5
compare<lt_halves>/64_mean                                           +0.0044         +0.0044             3             3             3             3
compare<lt_halves>/128_mean                                          +0.0032         +0.0032             3             3             3             3
compare<lt_halves>/192_mean                                          +0.0051         +0.0051             3             3             3             3
compare<lt_halves>/256_mean                                          +0.0046         +0.0046             3             3             3             3
exponentiation/64_mean                                               -0.0100         -0.0100           729           722           729           722
exponentiation/128_mean                                              -0.0073         -0.0073          1457          1447          1457          1447
exponentiation/192_mean                                              -0.0125         -0.0125          2190          2163          2190          2163
exponentiation/256_mean                                              -0.0088         -0.0088          2914          2888          2914          2888
exponentiation2/0_mean                                               -0.0201         -0.0201            11            11            11            11
exponentiation2/64_mean                                              -0.0217         -0.0217            11            11            11            11
exponentiation2/128_mean                                             -0.0781         -0.0781            12            11            12            11
exponentiation2/256_mean                                             -0.0216         -0.0216            10            10            10            10
exponentiation2/512_mean                                             -0.0211         -0.0211            10            10            10            10
count_sigificant_words_256/0_mean                                    -0.0005         -0.0005             0             0             0             0
count_sigificant_words_256/1_mean                                    +0.0010         +0.0010             0             0             0             0
count_sigificant_words_256/2_mean                                    +0.0024         +0.0024             0             0             0             0
count_sigificant_words_256/3_mean                                    +0.0009         +0.0009             0             0             0             0
count_sigificant_words_256/4_mean                                    -0.0000         -0.0000             0             0             0             0
count_sigificant_words_256/5_mean                                    -0.0009         -0.0009             0             0             0             0
count_sigificant_words_256/6_mean                                    -0.0011         -0.0011             0             0             0             0
count_sigificant_words_256/7_mean                                    +0.0005         +0.0005             0             0             0             0
count_sigificant_words_256/8_mean                                    +0.0005         +0.0005             0             0             0             0
to_string<uint128>_mean                                              -0.0043         -0.0043           443           441           443           441
to_string<uint256>_mean                                              +0.0029         +0.0029          2428          2434          2428          2434
to_string<uint512>_mean                                              -0.0102         -0.0102          7608          7530          7608          7530
Apr 05 '22 12:04 chfast
Thanks for looking into it @chfast ! I'm not sure how to read the numbers you posted, but I am surprised that you found that add/sub got slower. In my tests it was faster. I noticed that sometimes running the same code twice gave different numbers though! Benchmarks are hard.
Apr 05 '22 13:04 greg7mdp
Also why diff between add and inline_add?
Apr 05 '22 13:04 greg7mdp
I'm not sure how to read the numbers
This is comparison between master and your changes. Numbers are:
CPU time diff,
wall-clock time diff,
master CPU time in ns,
your CPU time in ns,
master wall-clock time in ns,
your wall-clock time in ns.
This comes from a tool comparing outputs of test/intx-bench.
Apr 05 '22 13:04 chfast
Also why diff between add and inline_add?
Because the benchmark run in a loop the "inline_add" can be vectorized. This is not relevant for EVM use case. I probably could review all benchmark cases and remove half of these.
Apr 05 '22 13:04 chfast
Thanks!
the "inline_add" can be vectorized
vectorized or inlined? Why shouldn't it be relevant for EVM?
Apr 05 '22 13:04 greg7mdp
The main difference is that the addc implementation tries to use compiler builtins via __has_builtin() macro. This macro is not available until GCC 10 (although the builtin is supported). This can be "fixed" by adding more preprocessor conditions, but I try to stick the latest compiler versions if performance matters.
https://godbolt.org/z/1cxc6aTG3
Apr 05 '22 13:04 chfast
@chfast if you are using gcc 11 and therefore the builtin, I am puzzled that my version would be slower.
Apr 05 '22 14:04 greg7mdp
Kudos, SonarCloud Quality Gate passed!