rust Improved `checked_isqrt` and `isqrt` methods

Improved tests of `isqrt` and `checked_isqrt` implementations

Inputs chosen more thoroughly and systematically.
Checks that isqrt and checked_isqrt have equivalent results for signed types, either equivalent numerically or equivalent as a panic and a None.
Checks that isqrt has numerically-equivalent results for unsigned types and their NonZero counterparts.

Added benchmarks for `isqrt` implementations

Greatly sped up `checked_isqrt` and `isqrt` methods

Uses a lookup table for 8-bit integers and then the Karatsuba square root algorithm for larger integers.
Includes optimization hints that give the compiler the exact numeric range of results.

Feature tracking issue

isqrt is an unstable feature tracked at #116226.

Benchmarked improvements

Command used to benchmark

./x bench library/core -- int_sqrt

Before

benchmarks:
    num::int_sqrt::i128::isqrt           439591.65/iter  +/- 6652.70
    num::int_sqrt::i16::isqrt              5302.97/iter   +/- 160.93
    num::int_sqrt::i32::isqrt             62999.11/iter  +/- 2022.05
    num::int_sqrt::i64::isqrt            125248.81/iter  +/- 1674.43
    num::int_sqrt::i8::isqrt                123.56/iter     +/- 1.87
    num::int_sqrt::isize::isqrt          125356.56/iter  +/- 1017.03
    num::int_sqrt::non_zero_u128::isqrt  437443.75/iter  +/- 3535.43
    num::int_sqrt::non_zero_u16::isqrt     8604.58/iter    +/- 94.76
    num::int_sqrt::non_zero_u32::isqrt    62933.33/iter   +/- 517.30
    num::int_sqrt::non_zero_u64::isqrt   125076.38/iter +/- 11340.61
    num::int_sqrt::non_zero_u8::isqrt       221.51/iter     +/- 1.58
    num::int_sqrt::non_zero_usize::isqrt 136005.21/iter  +/- 2020.35
    num::int_sqrt::u128::isqrt           439014.55/iter  +/- 3920.45
    num::int_sqrt::u16::isqrt              8575.08/iter   +/- 148.06
    num::int_sqrt::u32::isqrt             63008.89/iter   +/- 803.67
    num::int_sqrt::u64::isqrt            125088.09/iter   +/- 879.29
    num::int_sqrt::u8::isqrt                230.18/iter     +/- 2.04
    num::int_sqrt::usize::isqrt          125237.51/iter  +/- 4747.83

After

benchmarks:
    num::int_sqrt::i128::isqrt           105184.89/iter +/- 1171.38
    num::int_sqrt::i16::isqrt              1910.26/iter   +/- 78.50
    num::int_sqrt::i32::isqrt             34260.34/iter  +/- 960.84
    num::int_sqrt::i64::isqrt             45939.19/iter +/- 2525.65
    num::int_sqrt::i8::isqrt                 22.87/iter    +/- 0.45
    num::int_sqrt::isize::isqrt           45884.17/iter  +/- 595.49
    num::int_sqrt::non_zero_u128::isqrt  106344.27/iter  +/- 780.99
    num::int_sqrt::non_zero_u16::isqrt     2790.19/iter   +/- 53.43
    num::int_sqrt::non_zero_u32::isqrt    33613.99/iter  +/- 362.96
    num::int_sqrt::non_zero_u64::isqrt    46235.42/iter  +/- 429.69
    num::int_sqrt::non_zero_u8::isqrt        31.78/iter    +/- 0.75
    num::int_sqrt::non_zero_usize::isqrt  46208.75/iter  +/- 375.27
    num::int_sqrt::u128::isqrt           106385.94/iter +/- 1649.95
    num::int_sqrt::u16::isqrt              2747.69/iter   +/- 28.72
    num::int_sqrt::u32::isqrt             33627.09/iter  +/- 475.68
    num::int_sqrt::u64::isqrt             46182.29/iter  +/- 311.16
    num::int_sqrt::u8::isqrt                 33.10/iter    +/- 0.30
    num::int_sqrt::usize::isqrt           46165.00/iter  +/- 388.41

Jul 25 '24 00:07 ChaiTRex

r? @tgross35

rustbot has assigned @tgross35. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Jul 25 '24 00:07 rustbot

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

#16 2.748 Building wheels for collected packages: reuse
#16 2.749   Building wheel for reuse (pyproject.toml): started
#16 3.000   Building wheel for reuse (pyproject.toml): finished with status 'done'
#16 3.002   Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132715 sha256=dfa09868353292d98f811d3efdb0d54d07389e808efc71d68e3b93c514bf8bec
#16 3.002   Stored in directory: /tmp/pip-ephem-wheel-cache-empqb9u2/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#16 3.005 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#16 3.406 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#16 3.406 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#16 DONE 3.5s
---
   |
46 | mod int_sqrt;
   | ^^^^^^^^^^^^^
   |
   = help: to create the module `int_sqrt`, create file "library/core/src/num/int_sqrt.rs" or "library/core/src/num/int_sqrt/mod.rs"
   = note: if there is a `mod int_sqrt` elsewhere in the crate already, import it with `use crate::...` instead
   Compiling std v0.0.0 (/checkout/library/std)
error[E0425]: cannot find function `u8` in module `super::int_sqrt`
    --> library/core/src/num/nonzero.rs:1694:26
     |
---
    --> library/core/src/num/int_macros.rs:2781:39
     |
1    | /  macro_rules! int_impl {
2    | |      (
3    | |          Self = $SelfT:ty,
4    | |          ActualT = $ActualT:ident,
2781 | |                  crate::num::int_sqrt::panic_for_negative_argument();
     | |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
...    |
3616 | |      }
---
    --> library/core/src/num/int_macros.rs:2781:39
     |
1    | /  macro_rules! int_impl {
2    | |      (
3    | |          Self = $SelfT:ty,
4    | |          ActualT = $ActualT:ident,
2781 | |                  crate::num::int_sqrt::panic_for_negative_argument();
     | |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
...    |
3616 | |      }
---
    --> library/core/src/num/int_macros.rs:2781:39
     |
1    | /  macro_rules! int_impl {
2    | |      (
3    | |          Self = $SelfT:ty,
4    | |          ActualT = $ActualT:ident,
2781 | |                  crate::num::int_sqrt::panic_for_negative_argument();
     | |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
...    |
3616 | |      }
---
    --> library/core/src/num/int_macros.rs:2781:39
     |
1    | /  macro_rules! int_impl {
2    | |      (
3    | |          Self = $SelfT:ty,
4    | |          ActualT = $ActualT:ident,
2781 | |                  crate::num::int_sqrt::panic_for_negative_argument();
     | |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
...    |
3616 | |      }
---
    --> library/core/src/num/int_macros.rs:2781:39
     |
1    | /  macro_rules! int_impl {
2    | |      (
3    | |          Self = $SelfT:ty,
4    | |          ActualT = $ActualT:ident,
2781 | |                  crate::num::int_sqrt::panic_for_negative_argument();
     | |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
...    |
3616 | |      }
---
    --> library/core/src/num/int_macros.rs:2781:39
     |
1    | /  macro_rules! int_impl {
2    | |      (
3    | |          Self = $SelfT:ty,
4    | |          ActualT = $ActualT:ident,
2781 | |                  crate::num::int_sqrt::panic_for_negative_argument();
     | |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
...    |
3616 | |      }
---
    --> library/core/src/num/int_macros.rs:2780:25
     |
1    | /   macro_rules! int_impl {
2    | |       (
3    | |           Self = $SelfT:ty,
4    | |           ActualT = $ActualT:ident,
2780 | |               if self < 0 {
     | |  _________________________^
2781 | | |                 crate::num::int_sqrt::panic_for_negative_argument();
2782 | | |             } else {
---
    --> library/core/src/num/int_macros.rs:2780:25
     |
1    | /   macro_rules! int_impl {
2    | |       (
3    | |           Self = $SelfT:ty,
4    | |           ActualT = $ActualT:ident,
2780 | |               if self < 0 {
     | |  _________________________^
2781 | | |                 crate::num::int_sqrt::panic_for_negative_argument();
2782 | | |             } else {
---
    --> library/core/src/num/int_macros.rs:2780:25
     |
1    | /   macro_rules! int_impl {
2    | |       (
3    | |           Self = $SelfT:ty,
4    | |           ActualT = $ActualT:ident,
2780 | |               if self < 0 {
     | |  _________________________^
2781 | | |                 crate::num::int_sqrt::panic_for_negative_argument();
2782 | | |             } else {
---
    --> library/core/src/num/int_macros.rs:2780:25
     |
1    | /   macro_rules! int_impl {
2    | |       (
3    | |           Self = $SelfT:ty,
4    | |           ActualT = $ActualT:ident,
2780 | |               if self < 0 {
     | |  _________________________^
2781 | | |                 crate::num::int_sqrt::panic_for_negative_argument();
2782 | | |             } else {
---
    --> library/core/src/num/int_macros.rs:2780:25
     |
1    | /   macro_rules! int_impl {
2    | |       (
3    | |           Self = $SelfT:ty,
4    | |           ActualT = $ActualT:ident,
2780 | |               if self < 0 {
     | |  _________________________^
2781 | | |                 crate::num::int_sqrt::panic_for_negative_argument();
2782 | | |             } else {
---
    --> library/core/src/num/int_macros.rs:2780:25
     |
1    | /   macro_rules! int_impl {
2    | |       (
3    | |           Self = $SelfT:ty,
4    | |           ActualT = $ActualT:ident,
2780 | |               if self < 0 {
     | |  _________________________^
2781 | | |                 crate::num::int_sqrt::panic_for_negative_argument();
2782 | | |             } else {
---
457  |   |         Self = isize,
458  |   |         ActualT = i64,
459  |   |         UnsignedT = usize,
...      |
474  |   |         bound_condition = " on 64-bit targets",
475  |   |     }
     |   |_____- in this macro invocation
Some errors have detailed explanations: E0308, E0425, E0583.
For more information about an error, try `rustc --explain E0308`.
error: could not compile `core` (lib) due to 31 previous errors
Build completed unsuccessfully in 0:00:27

Jul 25 '24 00:07 rust-log-analyzer

I'll take a closer look in a bit but FYI your last commit message has a typo (s/sped/speed)

Jul 25 '24 00:07 tgross35

The last force push added a missing file.

I'll take a closer look in a bit but FYI your last commit message has a typo (s/sped/speed)

OK, thanks for taking a look. Oh, I was using 'sped' as a past tense of the verb form of 'speed'.

Jul 25 '24 00:07 ChaiTRex

I'll take a closer look in a bit but FYI your last commit message has a typo (s/sped/speed)

OK, thanks for taking a look. Oh, I was using 'sped' as a past tense of the verb form of 'speed'.

Ah, I see you did the same for the other commits too. Commit messages are usually in imperative mood to describe what the change does, i.e. "Improve test", "Add benchmarks" "Greatly speed up". Hence my confusion :)

(Rust has no commit message policy so no need to anything if you don't feel like it).

Jul 25 '24 01:07 tgross35

Looks pretty reasonable after a more thorough read through, awesome improvements to the tests. I think this should be good after combining some similar code, just to make things a bit easier to understand & maintain.

If you run benchmarks again, could you post your cpu model for a reference?

@rustbot author

Jul 26 '24 08:07 tgross35

If you run benchmarks again, could you post your cpu model for a reference?

I was using aarch64-apple-darwin on a MacBook Air with an M1 processor under macOS 14.5.1 (now 14.6).

After I replaced the benchmarks with equivalents to ilog10's benchmarks (which use an exponential distribution for the random inputs), I get the following:

Benchmarks

Before this pull request

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  108945.82/iter +/- 1673.86
    num::int_sqrt::u128_sqrt_random        16389.17/iter  +/- 231.44
    num::int_sqrt::u128_sqrt_random_small   1290.22/iter   +/- 41.90
    num::int_sqrt::u16_sqrt_predictable     1080.06/iter   +/- 43.44
    num::int_sqrt::u16_sqrt_random          1064.57/iter   +/- 20.79
    num::int_sqrt::u16_sqrt_random_small     579.85/iter   +/- 13.81
    num::int_sqrt::u32_sqrt_predictable     2979.65/iter   +/- 29.79
    num::int_sqrt::u32_sqrt_random          1718.20/iter   +/- 58.86
    num::int_sqrt::u32_sqrt_random_small     480.19/iter   +/- 26.59
    num::int_sqrt::u64_sqrt_predictable    13089.55/iter  +/- 484.72
    num::int_sqrt::u64_sqrt_random          3751.04/iter   +/- 79.66
    num::int_sqrt::u64_sqrt_random_small     498.58/iter    +/- 3.60
    num::int_sqrt::u8_sqrt_predictable       329.07/iter   +/- 12.64
    num::int_sqrt::u8_sqrt_random            576.37/iter   +/- 47.12
    num::int_sqrt::u8_sqrt_random_small      589.01/iter   +/- 22.53

When I first made this pull request

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  30728.33/iter +/- 1093.31
    num::int_sqrt::u128_sqrt_random        4392.30/iter  +/- 149.35
    num::int_sqrt::u128_sqrt_random_small   323.58/iter    +/- 7.26
    num::int_sqrt::u16_sqrt_predictable     304.50/iter   +/- 10.60
    num::int_sqrt::u16_sqrt_random          334.78/iter   +/- 18.02
    num::int_sqrt::u16_sqrt_random_small    139.06/iter    +/- 2.50
    num::int_sqrt::u32_sqrt_predictable    1585.35/iter   +/- 59.03
    num::int_sqrt::u32_sqrt_random          972.40/iter   +/- 38.47
    num::int_sqrt::u32_sqrt_random_small    149.02/iter    +/- 4.50
    num::int_sqrt::u64_sqrt_predictable    5641.28/iter  +/- 136.16
    num::int_sqrt::u64_sqrt_random         1703.97/iter   +/- 43.75
    num::int_sqrt::u64_sqrt_random_small    197.42/iter    +/- 4.03
    num::int_sqrt::u8_sqrt_predictable       42.90/iter    +/- 1.39
    num::int_sqrt::u8_sqrt_random           133.55/iter    +/- 3.92
    num::int_sqrt::u8_sqrt_random_small     133.85/iter    +/- 2.36

With the changes I'm going to push soon

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  26622.63/iter +/- 386.54
    num::int_sqrt::u128_sqrt_random        3584.59/iter +/- 131.42
    num::int_sqrt::u128_sqrt_random_small   904.23/iter  +/- 22.04
    num::int_sqrt::u16_sqrt_predictable     388.01/iter   +/- 7.85
    num::int_sqrt::u16_sqrt_random          480.00/iter  +/- 16.07
    num::int_sqrt::u16_sqrt_random_small    332.05/iter   +/- 7.46
    num::int_sqrt::u32_sqrt_predictable    1317.23/iter  +/- 25.71
    num::int_sqrt::u32_sqrt_random          783.17/iter   +/- 8.53
    num::int_sqrt::u32_sqrt_random_small    415.10/iter  +/- 14.06
    num::int_sqrt::u64_sqrt_predictable    4563.42/iter +/- 129.48
    num::int_sqrt::u64_sqrt_random         1378.32/iter  +/- 60.46
    num::int_sqrt::u64_sqrt_random_small    658.41/iter   +/- 8.93
    num::int_sqrt::u8_sqrt_predictable       42.90/iter   +/- 0.82
    num::int_sqrt::u8_sqrt_random           132.98/iter   +/- 1.53
    num::int_sqrt::u8_sqrt_random_small     132.94/iter   +/- 2.94

There's a tradeoff here. In the push I'm going to make soon, randomly chosen inputs from the whole input type range get a decent speedup from the initial pull request's code, but randomly chosen small inputs get about 3 times slower. Is that a good tradeoff? Is there another benchmark that should be added?

Aug 01 '24 06:08 ChaiTRex

OK, I've pushed my latest changes.

Please ignore commits 31577f5 and 82389f4. Apparently HEAD was detached on my machine.

Aug 01 '24 06:08 ChaiTRex

With a uniform distribution, we get these results:

Uniform distribution benchmarks

original_u8             time:   [9.7782 ns 9.8095 ns 9.8501 ns]
karatsuba_u8            time:   [5.2377 ns 5.2448 ns 5.2531 ns]
karatsuba_2_u8          time:   [5.2304 ns 5.2347 ns 5.2399 ns]

original_u16            time:   [13.415 ns 13.437 ns 13.465 ns]
karatsuba_u16           time:   [6.7425 ns 6.7487 ns 6.7568 ns]
karatsuba_2_u16         time:   [6.3988 ns 6.4035 ns 6.4094 ns]

original_u32            time:   [19.016 ns 19.044 ns 19.078 ns]
karatsuba_u32           time:   [10.162 ns 10.182 ns 10.211 ns]
karatsuba_2_u32         time:   [8.8756 ns 8.8866 ns 8.8993 ns]

original_u64            time:   [44.361 ns 44.413 ns 44.480 ns]
karatsuba_u64           time:   [19.043 ns 19.144 ns 19.276 ns]
karatsuba_2_u64         time:   [16.330 ns 16.359 ns 16.392 ns]

original_u128           time:   [147.14 ns 147.49 ns 147.88 ns]
karatsuba_u128          time:   [51.335 ns 51.455 ns 51.602 ns]
karatsuba_2_u128        time:   [45.447 ns 45.507 ns 45.575 ns]

Aug 01 '24 07:08 ChaiTRex

Thanks for the changes, this looks a lot tidier with the deduplication. I left a handful of new comments but those are just small things, nothing major.

I'm not sure whether you had been waiting on a review, but comment @rustbot ready whenever you are done. This will need a squash before merge but no need to do that until it is final.

Aug 12 '24 07:08 tgross35

There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged.

You can start a rebase with the following commands:

$ # rebase
$ git pull --rebase https://github.com/rust-lang/rust.git master
$ git push --force-with-lease

The following commits are merge commits:

d61b4ddb16cf89aedfefa546f0566cc8c4041875

Aug 19 '24 23:08 rustbot

With regard to the squash, would it be alright to have two final commits, one for the testing and benchmarking and the next for the implementation? This would allow the benchmarking at least to be run on the prior implementation for comparison.

Aug 19 '24 23:08 ChaiTRex

@rustbot ready

Aug 20 '24 00:08 ChaiTRex

Aligning the lookup table to cache lines speeds up u8 on Apple M1:

Apple M1 benchmarks

Before cache line alignment:

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  25464.64/iter +/- 1273.90
    num::int_sqrt::u128_sqrt_random        3432.59/iter   +/- 25.72
    num::int_sqrt::u128_sqrt_random_small   808.12/iter   +/- 17.95
    num::int_sqrt::u128_sqrt_uniform       5913.12/iter  +/- 373.72
    num::int_sqrt::u16_sqrt_predictable     422.50/iter   +/- 24.88
    num::int_sqrt::u16_sqrt_random          521.85/iter   +/- 20.16
    num::int_sqrt::u16_sqrt_random_small    412.85/iter   +/- 10.96
    num::int_sqrt::u16_sqrt_uniform         618.18/iter   +/- 21.46
    num::int_sqrt::u32_sqrt_predictable    1267.82/iter   +/- 40.84
    num::int_sqrt::u32_sqrt_random          735.54/iter   +/- 15.50
    num::int_sqrt::u32_sqrt_random_small    413.31/iter    +/- 7.09
    num::int_sqrt::u32_sqrt_uniform        1043.55/iter   +/- 21.41
    num::int_sqrt::u64_sqrt_predictable    4151.74/iter  +/- 109.88
    num::int_sqrt::u64_sqrt_random         1276.77/iter   +/- 27.18
    num::int_sqrt::u64_sqrt_random_small    575.61/iter   +/- 12.93
    num::int_sqrt::u64_sqrt_uniform        1679.02/iter   +/- 40.40
    num::int_sqrt::u8_sqrt_predictable       42.26/iter    +/- 0.88
    num::int_sqrt::u8_sqrt_random           170.62/iter    +/- 4.95
    num::int_sqrt::u8_sqrt_random_small     170.88/iter    +/- 3.79
    num::int_sqrt::u8_sqrt_uniform          170.88/iter    +/- 2.93

After cache line alignment:

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  25146.56/iter +/- 145.08
    num::int_sqrt::u128_sqrt_random        3433.57/iter  +/- 26.17
    num::int_sqrt::u128_sqrt_random_small   818.61/iter  +/- 53.39
    num::int_sqrt::u128_sqrt_uniform       5891.31/iter +/- 147.90
    num::int_sqrt::u16_sqrt_predictable     411.59/iter  +/- 11.32
    num::int_sqrt::u16_sqrt_random          527.51/iter  +/- 25.60
    num::int_sqrt::u16_sqrt_random_small    406.55/iter   +/- 7.55
    num::int_sqrt::u16_sqrt_uniform         617.40/iter  +/- 19.75
    num::int_sqrt::u32_sqrt_predictable    1255.94/iter  +/- 21.66
    num::int_sqrt::u32_sqrt_random          738.56/iter  +/- 17.08
    num::int_sqrt::u32_sqrt_random_small    409.39/iter   +/- 8.64
    num::int_sqrt::u32_sqrt_uniform        1040.39/iter  +/- 21.58
    num::int_sqrt::u64_sqrt_predictable    4144.49/iter +/- 264.87
    num::int_sqrt::u64_sqrt_random         1280.68/iter  +/- 31.78
    num::int_sqrt::u64_sqrt_random_small    574.02/iter  +/- 11.66
    num::int_sqrt::u64_sqrt_uniform        1677.55/iter  +/- 16.71
    num::int_sqrt::u8_sqrt_predictable       42.19/iter   +/- 0.80
    num::int_sqrt::u8_sqrt_random           130.60/iter   +/- 2.61
    num::int_sqrt::u8_sqrt_random_small     130.79/iter   +/- 2.30
    num::int_sqrt::u8_sqrt_uniform          131.69/iter   +/- 2.58

It has no effect on Ryzen 9 5900X:

Ryzen 9 5900X benchmarks

Before cache line alignment:

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  19691.09/iter +/- 741.04
    num::int_sqrt::u128_sqrt_random        2894.61/iter  +/- 38.25
    num::int_sqrt::u128_sqrt_random_small  1154.15/iter  +/- 13.97
    num::int_sqrt::u128_sqrt_uniform       4114.75/iter +/- 239.00
    num::int_sqrt::u16_sqrt_predictable     536.08/iter  +/- 24.90
    num::int_sqrt::u16_sqrt_random          593.93/iter  +/- 14.95
    num::int_sqrt::u16_sqrt_random_small    503.55/iter   +/- 8.22
    num::int_sqrt::u16_sqrt_uniform         844.66/iter   +/- 6.84
    num::int_sqrt::u32_sqrt_predictable    1623.75/iter  +/- 25.16
    num::int_sqrt::u32_sqrt_random          870.58/iter   +/- 8.56
    num::int_sqrt::u32_sqrt_random_small    534.55/iter  +/- 22.62
    num::int_sqrt::u32_sqrt_uniform        1344.67/iter  +/- 26.97
    num::int_sqrt::u64_sqrt_predictable    4681.89/iter +/- 173.77
    num::int_sqrt::u64_sqrt_random         1392.21/iter  +/- 51.08
    num::int_sqrt::u64_sqrt_random_small    748.14/iter   +/- 7.08
    num::int_sqrt::u64_sqrt_uniform        1854.12/iter  +/- 60.87
    num::int_sqrt::u8_sqrt_predictable       43.19/iter   +/- 0.97
    num::int_sqrt::u8_sqrt_random           114.65/iter   +/- 2.01
    num::int_sqrt::u8_sqrt_random_small     113.95/iter   +/- 1.03
    num::int_sqrt::u8_sqrt_uniform          114.16/iter   +/- 1.29

After cache line alignment:

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  19642.04/iter +/- 206.92
    num::int_sqrt::u128_sqrt_random        2830.08/iter +/- 141.34
    num::int_sqrt::u128_sqrt_random_small  1153.06/iter  +/- 13.33
    num::int_sqrt::u128_sqrt_uniform       4076.04/iter +/- 234.03
    num::int_sqrt::u16_sqrt_predictable     526.21/iter  +/- 28.71
    num::int_sqrt::u16_sqrt_random          581.78/iter   +/- 8.87
    num::int_sqrt::u16_sqrt_random_small    494.27/iter   +/- 5.85
    num::int_sqrt::u16_sqrt_uniform         828.62/iter  +/- 16.26
    num::int_sqrt::u32_sqrt_predictable    1611.76/iter  +/- 12.15
    num::int_sqrt::u32_sqrt_random          892.10/iter   +/- 6.39
    num::int_sqrt::u32_sqrt_random_small    547.66/iter   +/- 3.65
    num::int_sqrt::u32_sqrt_uniform        1326.88/iter  +/- 37.17
    num::int_sqrt::u64_sqrt_predictable    4748.65/iter  +/- 48.63
    num::int_sqrt::u64_sqrt_random         1434.99/iter  +/- 17.58
    num::int_sqrt::u64_sqrt_random_small    766.33/iter   +/- 9.79
    num::int_sqrt::u64_sqrt_uniform        1902.92/iter  +/- 16.71
    num::int_sqrt::u8_sqrt_predictable       43.87/iter   +/- 2.40
    num::int_sqrt::u8_sqrt_random           116.94/iter   +/- 3.94
    num::int_sqrt::u8_sqrt_random_small     116.49/iter   +/- 1.41
    num::int_sqrt::u8_sqrt_uniform          116.40/iter   +/- 0.80

Aug 20 '24 15:08 ChaiTRex

@rustbot author

Aug 25 '24 01:08 tgross35

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

#16 2.759 Building wheels for collected packages: reuse
#16 2.760   Building wheel for reuse (pyproject.toml): started
#16 3.006   Building wheel for reuse (pyproject.toml): finished with status 'done'
#16 3.007   Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132715 sha256=dfa09868353292d98f811d3efdb0d54d07389e808efc71d68e3b93c514bf8bec
#16 3.008   Stored in directory: /tmp/pip-ephem-wheel-cache-zyqgdrpr/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#16 3.011 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#16 3.407 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#16 3.408 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#16 DONE 3.5s
---
   Compiling std v0.0.0 (/checkout/library/std)
error: unused import: `hint`
  --> core/src/num/nonzero.rs:10:18
   |
10 | use crate::{fmt, hint, intrinsics, ptr, ub_checks};
   |
   = note: `-D unused-imports` implied by `-D warnings`
   = help: to override `-D warnings` add `#[allow(unused_imports)]`

Aug 26 '24 04:08 rust-log-analyzer

OK, I've force pushed the modified history.

Verification that file changes before history rewrite and after history rewrite are identical

Last commit before modifying history: 35fb40cace0b75860a97d9b71f27fdefcd51242b.

$ git diff 35fb40cace0b75860a97d9b71f27fdefcd51242b
$ git log -n 3
commit e1d06300d4e89ce6cd5ff9f92acc06cc49e37bd3 (HEAD -> isqrt)
Author: Chai T. Rex <[email protected]>
Date:   Mon Aug 26 01:36:25 2024 -0400

    Speed up `checked_isqrt` and `isqrt` methods

    * Use a lookup table for 8-bit integers and the Karatsuba square root
      algorithm for larger integers.
    * Include optimization hints that give the compiler the exact numeric
      range of results.

commit 6cd9bcdd815dd43256ca314b962788db639c5c6c
Author: Chai T. Rex <[email protected]>
Date:   Mon Aug 26 01:34:25 2024 -0400

    Improve `isqrt` tests and add benchmarks

    * Choose test inputs more thoroughly and systematically.
    * Check that `isqrt` and `checked_isqrt` have equivalent results for
      signed types, either equivalent numerically or equivalent as a panic
      and a `None`.
    * Check that `isqrt` has numerically-equivalent results for unsigned
      types and their `NonZero` counterparts.
    * Reuse `ilog10` benchmarks, plus benchmarks that use a uniform
      distribution.

commit 636d7ff91b9847d6d43c7bbe023568828f6e3246
Merge: 5601d142498 77303568c04
Author: bors <[email protected]>
Date:   Mon Aug 19 20:42:45 2024 +0000

    Auto merge of #129275 - matthiaskrgr:rollup-qv64hg6, r=matthiaskrgr

    Rollup of 6 pull requests

    Successful merges:

     - #129194 (Fix bootstrap test `detect_src_and_out` on Windows)
     - #129217 (safe transmute: check lifetimes)
     - #129223 ( Fix wrong argument for `get_fn_decl`)
     - #129235 (Check that `#[may_dangle]` is properly applied)
     - #129245 (Fix a typo in `rustc_hir` doc comment)
     - #129271 (Prevent double panic in query system, improve diagnostics)

    r? `@ghost`
    `@rustbot` modify labels: rollup

Aug 26 '24 06:08 ChaiTRex

Verification that file changes before history rewrite and after history rewrite are identical

GitHub did that already :) from the compare link it gives your push https://github.com/rust-lang/rust/compare/35fb40cace0b75860a97d9b71f27fdefcd51242b..e1d06300d4e89ce6cd5ff9f92acc06cc49e37bd3 (works as long as the base doesn't change, otherwise it's terrible).

This looks awesome, thank you for putting all the effort in! Post benchmarks after the final changes if you get the chance.

@bors r+

Aug 26 '24 06:08 tgross35

:pushpin: Commit e1d06300d4e89ce6cd5ff9f92acc06cc49e37bd3 has been approved by tgross35

It is now in the queue for this repository.

Aug 26 '24 06:08 bors

This looks awesome, thank you for putting all the effort in!

Thank you for all the suggestions for improvements!

Post benchmarks after the final changes if you get the chance.

Apple M1 benchmarks

Before changes

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  114378.64/iter +/- 2533.28
    num::int_sqrt::u128_sqrt_random        16750.05/iter  +/- 737.74
    num::int_sqrt::u128_sqrt_random_small   1258.67/iter   +/- 41.06
    num::int_sqrt::u128_sqrt_uniform       33110.56/iter +/- 1104.33

    num::int_sqrt::u16_sqrt_predictable     1117.65/iter   +/- 55.38
    num::int_sqrt::u16_sqrt_random          1116.89/iter +/- 1201.61
    num::int_sqrt::u16_sqrt_random_small     573.27/iter   +/- 16.78
    num::int_sqrt::u16_sqrt_uniform         2072.08/iter  +/- 111.98

    num::int_sqrt::u32_sqrt_predictable     3065.00/iter  +/- 206.74
    num::int_sqrt::u32_sqrt_random          1791.38/iter   +/- 93.77
    num::int_sqrt::u32_sqrt_random_small     461.37/iter   +/- 24.58
    num::int_sqrt::u32_sqrt_uniform         3788.21/iter  +/- 267.84

    num::int_sqrt::u64_sqrt_predictable    13204.51/iter   +/- 97.78
    num::int_sqrt::u64_sqrt_random          3728.35/iter  +/- 169.88
    num::int_sqrt::u64_sqrt_random_small     478.55/iter   +/- 17.05
    num::int_sqrt::u64_sqrt_uniform         9146.69/iter  +/- 156.14

    num::int_sqrt::u8_sqrt_predictable       333.17/iter    +/- 2.83
    num::int_sqrt::u8_sqrt_random            567.97/iter    +/- 5.24
    num::int_sqrt::u8_sqrt_random_small      576.73/iter   +/- 14.10
    num::int_sqrt::u8_sqrt_uniform           952.84/iter   +/- 33.36

After changes

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  24316.02/iter +/- 1175.84
    num::int_sqrt::u128_sqrt_random        3290.37/iter   +/- 57.26
    num::int_sqrt::u128_sqrt_random_small   227.77/iter   +/- 19.86
    num::int_sqrt::u128_sqrt_uniform       5799.31/iter  +/- 171.87

    num::int_sqrt::u16_sqrt_predictable     261.07/iter    +/- 6.76
    num::int_sqrt::u16_sqrt_random          288.43/iter    +/- 5.81
    num::int_sqrt::u16_sqrt_random_small    136.94/iter    +/- 6.20
    num::int_sqrt::u16_sqrt_uniform         556.81/iter   +/- 20.06

    num::int_sqrt::u32_sqrt_predictable    1153.25/iter   +/- 73.52
    num::int_sqrt::u32_sqrt_random          701.98/iter   +/- 31.71
    num::int_sqrt::u32_sqrt_random_small    159.10/iter    +/- 4.83
    num::int_sqrt::u32_sqrt_uniform        1055.76/iter   +/- 18.27

    num::int_sqrt::u64_sqrt_predictable    3979.86/iter   +/- 52.21
    num::int_sqrt::u64_sqrt_random         1178.71/iter   +/- 16.93
    num::int_sqrt::u64_sqrt_random_small    250.95/iter    +/- 7.06
    num::int_sqrt::u64_sqrt_uniform        1740.82/iter   +/- 39.35

    num::int_sqrt::u8_sqrt_predictable       42.29/iter    +/- 1.12
    num::int_sqrt::u8_sqrt_random           132.03/iter    +/- 4.45
    num::int_sqrt::u8_sqrt_random_small     132.94/iter    +/- 5.08
    num::int_sqrt::u8_sqrt_uniform          131.97/iter    +/- 3.98

AMD Ryzen 9 5900X benchmarks

Before changes

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  107571.01/iter +/- 3586.42
    num::int_sqrt::u128_sqrt_random        14957.03/iter  +/- 171.20
    num::int_sqrt::u128_sqrt_random_small   1631.80/iter   +/- 15.79
    num::int_sqrt::u128_sqrt_uniform       29913.22/iter  +/- 491.33

    num::int_sqrt::u16_sqrt_predictable     1028.91/iter    +/- 6.69
    num::int_sqrt::u16_sqrt_random           952.49/iter    +/- 5.02
    num::int_sqrt::u16_sqrt_random_small     571.79/iter    +/- 6.40
    num::int_sqrt::u16_sqrt_uniform         1698.65/iter   +/- 23.04

    num::int_sqrt::u32_sqrt_predictable     2795.33/iter   +/- 12.68
    num::int_sqrt::u32_sqrt_random          1573.76/iter    +/- 8.02
    num::int_sqrt::u32_sqrt_random_small     543.35/iter    +/- 3.44
    num::int_sqrt::u32_sqrt_uniform         2963.37/iter   +/- 13.95

    num::int_sqrt::u64_sqrt_predictable    10291.50/iter  +/- 135.62
    num::int_sqrt::u64_sqrt_random          3012.46/iter   +/- 42.96
    num::int_sqrt::u64_sqrt_random_small     550.93/iter    +/- 5.55
    num::int_sqrt::u64_sqrt_uniform         6346.38/iter  +/- 290.38

    num::int_sqrt::u8_sqrt_predictable       343.44/iter    +/- 9.28
    num::int_sqrt::u8_sqrt_random            567.52/iter    +/- 3.47
    num::int_sqrt::u8_sqrt_random_small      567.99/iter   +/- 12.27
    num::int_sqrt::u8_sqrt_uniform           922.86/iter   +/- 18.49

After changes

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  17672.56/iter +/- 233.18
    num::int_sqrt::u128_sqrt_random        2762.53/iter +/- 127.37
    num::int_sqrt::u128_sqrt_random_small   225.04/iter  +/- 10.44
    num::int_sqrt::u128_sqrt_uniform       3889.07/iter  +/- 52.62

    num::int_sqrt::u16_sqrt_predictable     373.94/iter   +/- 4.32
    num::int_sqrt::u16_sqrt_random          355.54/iter   +/- 3.56
    num::int_sqrt::u16_sqrt_random_small    115.20/iter   +/- 1.77
    num::int_sqrt::u16_sqrt_uniform         709.20/iter  +/- 13.64

    num::int_sqrt::u32_sqrt_predictable    1308.96/iter  +/- 13.35
    num::int_sqrt::u32_sqrt_random          823.75/iter  +/- 15.02
    num::int_sqrt::u32_sqrt_random_small    222.49/iter  +/- 52.82
    num::int_sqrt::u32_sqrt_uniform        1253.50/iter  +/- 30.43

    num::int_sqrt::u64_sqrt_predictable    4170.12/iter  +/- 77.05
    num::int_sqrt::u64_sqrt_random         1312.70/iter  +/- 26.43
    num::int_sqrt::u64_sqrt_random_small    278.75/iter   +/- 4.11
    num::int_sqrt::u64_sqrt_uniform        1813.38/iter  +/- 27.48

    num::int_sqrt::u8_sqrt_predictable       43.85/iter   +/- 1.51
    num::int_sqrt::u8_sqrt_random           110.46/iter   +/- 0.90
    num::int_sqrt::u8_sqrt_random_small     110.51/iter   +/- 1.15
    num::int_sqrt::u8_sqrt_uniform          110.56/iter   +/- 0.96

Aug 26 '24 06:08 ChaiTRex

There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged.

You can start a rebase with the following commands:

$ # rebase
$ git pull --rebase https://github.com/rust-lang/rust.git master
$ git push --force-with-lease

The following commits are merge commits:

a1d27b868ca651fd0295e63c6c66fb69f1f3543b

Aug 27 '24 04:08 rustbot

Resolved merge conflict

@rustbot ready

Aug 27 '24 06:08 ChaiTRex

Thanks!

@bors r+

Aug 27 '24 06:08 tgross35

:pushpin: Commit 8e54e871375616606771e4efec2d73409b3d70ce has been approved by tgross35

It is now in the queue for this repository.

Aug 27 '24 06:08 bors

Pretty sure this nailed this rollup: https://github.com/rust-lang/rust/pull/129705#issuecomment-2316373910 @bors r-

Aug 29 '24 02:08 workingjubilee

That is pretty weird, none of this stuff should be platform-dependent unless llvm is doing something wrong. I don't think anything else there could have caused it, but let's get results here just to be sure.

@bors try

@ChaiTRex you'll probably have to debug this locally. Or maybe just remove all the unsafe asserts and change debug_assert to assert and we can try again, to get a hint if that is where the problem is coming from.

Aug 29 '24 02:08 tgross35

:hourglass: Trying commit 8e54e871375616606771e4efec2d73409b3d70ce with merge 99b5a9b7a8b9f56c6cd7178bab7d45ee537cdc5d...

Aug 29 '24 02:08 bors

That is pretty weird, none of this stuff should be platform-dependent unless llvm is doing something wrong. I don't think anything else there could have caused it, but let's get results here just to be sure.

My guess is that Wasm doesn't support panic unwinding, so when I use catch_unwind in the tests to ensure that negative inputs panic, Wasm doesn't like that.

Someone suggested #[cfg(panic = "unwind")] for that part, so I'll work on that.

Aug 29 '24 02:08 ChaiTRex

That makes more sense. I'll retry after PR check finishes.

Aug 29 '24 03:08 tgross35

The job test-various failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

file:.git/config remote.origin.url=https://github.com/rust-lang-ci/rust
file:.git/config remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
file:.git/config gc.auto=0
file:.git/config http.https://github.com/.extraheader=AUTHORIZATION: basic ***
file:.git/config branch.try.remote=origin
file:.git/config branch.try.merge=refs/heads/try
file:.git/config submodule.library/backtrace.url=https://github.com/rust-lang/backtrace-rs.git
file:.git/config submodule.library/stdarch.active=true
file:.git/config submodule.library/stdarch.url=https://github.com/rust-lang/stdarch.git
file:.git/config submodule.src/doc/book.active=true
---
test num::int_log::ilog10_of_0_panic ... ignored
Error: failed to run main module `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-std/wasm32-wasip1/release/deps/coretests-725e4eb03cd34f4a.wasm`

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0: 0x1889d5 - coretests-725e4eb03cd34f4a.wasm!__rust_start_panic
           1: 0x1884a7 - coretests-725e4eb03cd34f4a.wasm!rust_panic
           2: 0x188306 - coretests-725e4eb03cd34f4a.wasm!std::panicking::rust_panic_with_hook::h4b7cdc6a86811ab2
           3: 0x187153 - coretests-725e4eb03cd34f4a.wasm!std::panicking::begin_panic_handler::{{closure}}::h5d51873c650ae66a
           4: 0x187092 - coretests-725e4eb03cd34f4a.wasm!std::sys::backtrace::__rust_end_short_backtrace::h99bf0f7bbe9f5e25
           5: 0x187c42 - coretests-725e4eb03cd34f4a.wasm!rust_begin_unwind
           6: 0x195f5d - coretests-725e4eb03cd34f4a.wasm!core::panicking::panic_fmt::hf9f7de3dbe87d903
           7: 0x195fa1 - coretests-725e4eb03cd34f4a.wasm!core::num::int_sqrt::panic_for_negative_argument::hc1127e51220785cc
           8: 0x2073 - coretests-725e4eb03cd34f4a.wasm!std::panicking::try::do_call::h8729d35fde7e957c
           9: 0xc2cdd - coretests-725e4eb03cd34f4a.wasm!core::ops::function::FnOnce::call_once::hae34aa187e462869
          10: 0x179318 - coretests-725e4eb03cd34f4a.wasm!test::__rust_begin_short_backtrace::hed313d8f52171515
          11: 0x17917c - coretests-725e4eb03cd34f4a.wasm!test::types::RunnableTest::run::h7997ca9580bbaeb7
          12: 0x163115 - coretests-725e4eb03cd34f4a.wasm!test::run_test::h08c5263a618aa01e
          13: 0x15aec0 - coretests-725e4eb03cd34f4a.wasm!test::console::run_tests_console::h092bd764fe45d0b9
          14: 0x1794b6 - coretests-725e4eb03cd34f4a.wasm!test::test_main::h1b006f7d4c4b791a
          15: 0x17a16c - coretests-725e4eb03cd34f4a.wasm!test::test_main_static::h28974b6ae93adab2
          16: 0x1433f6 - coretests-725e4eb03cd34f4a.wasm!coretests::main::h5f5ac5fe77cac24f
          17: 0x1e55 - coretests-725e4eb03cd34f4a.wasm!std::sys::backtrace::__rust_begin_short_backtrace::h38b211e3e9744845
          18: 0x1e48 - coretests-725e4eb03cd34f4a.wasm!std::rt::lang_start::{{closure}}::h299149183f8c9350
          19: 0x181bbd - coretests-725e4eb03cd34f4a.wasm!std::rt::lang_start_internal::h15e4ecb84bb7fefe
          20: 0x14342e - coretests-725e4eb03cd34f4a.wasm!__main_void
          21: 0x1812 - coretests-725e4eb03cd34f4a.wasm!_start
       note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable may show more debugging information
    2: wasm trap: wasm `unreachable` instruction executed
test num::int_log::ilog10_u16 ... ok
test num::int_log::ilog10_u32 ... ok
test num::int_log::ilog10_u64 ... ok
test num::int_log::ilog10_u8 ... ok
test num::int_log::ilog10_u8 ... ok
test num::int_log::ilog1_of_1_panic ... ignored
test num::int_log::ilog2_of_0_panic ... ignored
test num::int_log::ilog3_of_0_panic ... ignored
error: test failed, to rerun pass `-p core --test coretests`

Caused by:
  process didn't exit successfully: `/wasmtime-v19.0.0-x86_64-linux/wasmtime run -C cache=n --dir . --env RUSTC_BOOTSTRAP /checkout/obj/build/x86_64-unknown-linux-gnu/stage1-std/wasm32-wasip1/release/deps/coretests-725e4eb03cd34f4a.wasm -Z unstable-options --format json` (exit status: 134)
note: test exited abnormally; to see the full output pass --nocapture to the harness.
  local time: Thu Aug 29 03:39:58 UTC 2024
  network time: Thu, 29 Aug 2024 03:39:58 GMT
##[error]Process completed with exit code 1.
Post job cleanup.

Aug 29 '24 03:08 rust-log-analyzer

Improved `checked_isqrt` and `isqrt` methods

Improved tests of isqrt and checked_isqrt implementations

Added benchmarks for isqrt implementations

Greatly sped up checked_isqrt and isqrt methods

Feature tracking issue

Command used to benchmark

Before

After

Before this pull request

When I first made this pull request

With the changes I'm going to push soon

Before changes

After changes

Before changes

After changes

Improved tests of `isqrt` and `checked_isqrt` implementations

Added benchmarks for `isqrt` implementations

Greatly sped up `checked_isqrt` and `isqrt` methods