Improved `checked_isqrt` and `isqrt` methods
Improved tests of isqrt and checked_isqrt implementations
- Inputs chosen more thoroughly and systematically.
- Checks that
isqrtandchecked_isqrthave equivalent results for signed types, either equivalent numerically or equivalent as a panic and aNone. - Checks that
isqrthas numerically-equivalent results for unsigned types and theirNonZerocounterparts.
Added benchmarks for isqrt implementations
Greatly sped up checked_isqrt and isqrt methods
- Uses a lookup table for 8-bit integers and then the Karatsuba square root algorithm for larger integers.
- Includes optimization hints that give the compiler the exact numeric range of results.
Feature tracking issue
isqrt is an unstable feature tracked at #116226.
Benchmarked improvements
Command used to benchmark
./x bench library/core -- int_sqrt
Before
benchmarks:
num::int_sqrt::i128::isqrt 439591.65/iter +/- 6652.70
num::int_sqrt::i16::isqrt 5302.97/iter +/- 160.93
num::int_sqrt::i32::isqrt 62999.11/iter +/- 2022.05
num::int_sqrt::i64::isqrt 125248.81/iter +/- 1674.43
num::int_sqrt::i8::isqrt 123.56/iter +/- 1.87
num::int_sqrt::isize::isqrt 125356.56/iter +/- 1017.03
num::int_sqrt::non_zero_u128::isqrt 437443.75/iter +/- 3535.43
num::int_sqrt::non_zero_u16::isqrt 8604.58/iter +/- 94.76
num::int_sqrt::non_zero_u32::isqrt 62933.33/iter +/- 517.30
num::int_sqrt::non_zero_u64::isqrt 125076.38/iter +/- 11340.61
num::int_sqrt::non_zero_u8::isqrt 221.51/iter +/- 1.58
num::int_sqrt::non_zero_usize::isqrt 136005.21/iter +/- 2020.35
num::int_sqrt::u128::isqrt 439014.55/iter +/- 3920.45
num::int_sqrt::u16::isqrt 8575.08/iter +/- 148.06
num::int_sqrt::u32::isqrt 63008.89/iter +/- 803.67
num::int_sqrt::u64::isqrt 125088.09/iter +/- 879.29
num::int_sqrt::u8::isqrt 230.18/iter +/- 2.04
num::int_sqrt::usize::isqrt 125237.51/iter +/- 4747.83
After
benchmarks:
num::int_sqrt::i128::isqrt 105184.89/iter +/- 1171.38
num::int_sqrt::i16::isqrt 1910.26/iter +/- 78.50
num::int_sqrt::i32::isqrt 34260.34/iter +/- 960.84
num::int_sqrt::i64::isqrt 45939.19/iter +/- 2525.65
num::int_sqrt::i8::isqrt 22.87/iter +/- 0.45
num::int_sqrt::isize::isqrt 45884.17/iter +/- 595.49
num::int_sqrt::non_zero_u128::isqrt 106344.27/iter +/- 780.99
num::int_sqrt::non_zero_u16::isqrt 2790.19/iter +/- 53.43
num::int_sqrt::non_zero_u32::isqrt 33613.99/iter +/- 362.96
num::int_sqrt::non_zero_u64::isqrt 46235.42/iter +/- 429.69
num::int_sqrt::non_zero_u8::isqrt 31.78/iter +/- 0.75
num::int_sqrt::non_zero_usize::isqrt 46208.75/iter +/- 375.27
num::int_sqrt::u128::isqrt 106385.94/iter +/- 1649.95
num::int_sqrt::u16::isqrt 2747.69/iter +/- 28.72
num::int_sqrt::u32::isqrt 33627.09/iter +/- 475.68
num::int_sqrt::u64::isqrt 46182.29/iter +/- 311.16
num::int_sqrt::u8::isqrt 33.10/iter +/- 0.30
num::int_sqrt::usize::isqrt 46165.00/iter +/- 388.41
r? @tgross35
rustbot has assigned @tgross35. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r? to explicitly pick a reviewer
The job mingw-check failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
#16 2.748 Building wheels for collected packages: reuse
#16 2.749 Building wheel for reuse (pyproject.toml): started
#16 3.000 Building wheel for reuse (pyproject.toml): finished with status 'done'
#16 3.002 Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132715 sha256=dfa09868353292d98f811d3efdb0d54d07389e808efc71d68e3b93c514bf8bec
#16 3.002 Stored in directory: /tmp/pip-ephem-wheel-cache-empqb9u2/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#16 3.005 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#16 3.406 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#16 3.406 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#16 DONE 3.5s
---
|
46 | mod int_sqrt;
| ^^^^^^^^^^^^^
|
= help: to create the module `int_sqrt`, create file "library/core/src/num/int_sqrt.rs" or "library/core/src/num/int_sqrt/mod.rs"
= note: if there is a `mod int_sqrt` elsewhere in the crate already, import it with `use crate::...` instead
Compiling std v0.0.0 (/checkout/library/std)
error[E0425]: cannot find function `u8` in module `super::int_sqrt`
--> library/core/src/num/nonzero.rs:1694:26
|
---
--> library/core/src/num/int_macros.rs:2781:39
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2781 | | crate::num::int_sqrt::panic_for_negative_argument();
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
... |
3616 | | }
---
--> library/core/src/num/int_macros.rs:2781:39
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2781 | | crate::num::int_sqrt::panic_for_negative_argument();
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
... |
3616 | | }
---
--> library/core/src/num/int_macros.rs:2781:39
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2781 | | crate::num::int_sqrt::panic_for_negative_argument();
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
... |
3616 | | }
---
--> library/core/src/num/int_macros.rs:2781:39
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2781 | | crate::num::int_sqrt::panic_for_negative_argument();
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
... |
3616 | | }
---
--> library/core/src/num/int_macros.rs:2781:39
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2781 | | crate::num::int_sqrt::panic_for_negative_argument();
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
... |
3616 | | }
---
--> library/core/src/num/int_macros.rs:2781:39
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2781 | | crate::num::int_sqrt::panic_for_negative_argument();
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `crate::num::int_sqrt`
... |
3616 | | }
---
--> library/core/src/num/int_macros.rs:2780:25
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2780 | | if self < 0 {
| | _________________________^
2781 | | | crate::num::int_sqrt::panic_for_negative_argument();
2782 | | | } else {
---
--> library/core/src/num/int_macros.rs:2780:25
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2780 | | if self < 0 {
| | _________________________^
2781 | | | crate::num::int_sqrt::panic_for_negative_argument();
2782 | | | } else {
---
--> library/core/src/num/int_macros.rs:2780:25
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2780 | | if self < 0 {
| | _________________________^
2781 | | | crate::num::int_sqrt::panic_for_negative_argument();
2782 | | | } else {
---
--> library/core/src/num/int_macros.rs:2780:25
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2780 | | if self < 0 {
| | _________________________^
2781 | | | crate::num::int_sqrt::panic_for_negative_argument();
2782 | | | } else {
---
--> library/core/src/num/int_macros.rs:2780:25
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2780 | | if self < 0 {
| | _________________________^
2781 | | | crate::num::int_sqrt::panic_for_negative_argument();
2782 | | | } else {
---
--> library/core/src/num/int_macros.rs:2780:25
|
1 | / macro_rules! int_impl {
2 | | (
3 | | Self = $SelfT:ty,
4 | | ActualT = $ActualT:ident,
2780 | | if self < 0 {
| | _________________________^
2781 | | | crate::num::int_sqrt::panic_for_negative_argument();
2782 | | | } else {
---
457 | | Self = isize,
458 | | ActualT = i64,
459 | | UnsignedT = usize,
... |
474 | | bound_condition = " on 64-bit targets",
475 | | }
| |_____- in this macro invocation
Some errors have detailed explanations: E0308, E0425, E0583.
For more information about an error, try `rustc --explain E0308`.
error: could not compile `core` (lib) due to 31 previous errors
Build completed unsuccessfully in 0:00:27
I'll take a closer look in a bit but FYI your last commit message has a typo (s/sped/speed)
The last force push added a missing file.
I'll take a closer look in a bit but FYI your last commit message has a typo (s/sped/speed)
OK, thanks for taking a look. Oh, I was using 'sped' as a past tense of the verb form of 'speed'.
I'll take a closer look in a bit but FYI your last commit message has a typo (s/sped/speed)
OK, thanks for taking a look. Oh, I was using 'sped' as a past tense of the verb form of 'speed'.
Ah, I see you did the same for the other commits too. Commit messages are usually in imperative mood to describe what the change does, i.e. "Improve test", "Add benchmarks" "Greatly speed up". Hence my confusion :)
(Rust has no commit message policy so no need to anything if you don't feel like it).
Looks pretty reasonable after a more thorough read through, awesome improvements to the tests. I think this should be good after combining some similar code, just to make things a bit easier to understand & maintain.
If you run benchmarks again, could you post your cpu model for a reference?
@rustbot author
If you run benchmarks again, could you post your cpu model for a reference?
I was using aarch64-apple-darwin on a MacBook Air with an M1 processor under macOS 14.5.1 (now 14.6).
After I replaced the benchmarks with equivalents to ilog10's benchmarks (which use an exponential distribution for the random inputs), I get the following:
Benchmarks
Before this pull request
benchmarks:
num::int_sqrt::u128_sqrt_predictable 108945.82/iter +/- 1673.86
num::int_sqrt::u128_sqrt_random 16389.17/iter +/- 231.44
num::int_sqrt::u128_sqrt_random_small 1290.22/iter +/- 41.90
num::int_sqrt::u16_sqrt_predictable 1080.06/iter +/- 43.44
num::int_sqrt::u16_sqrt_random 1064.57/iter +/- 20.79
num::int_sqrt::u16_sqrt_random_small 579.85/iter +/- 13.81
num::int_sqrt::u32_sqrt_predictable 2979.65/iter +/- 29.79
num::int_sqrt::u32_sqrt_random 1718.20/iter +/- 58.86
num::int_sqrt::u32_sqrt_random_small 480.19/iter +/- 26.59
num::int_sqrt::u64_sqrt_predictable 13089.55/iter +/- 484.72
num::int_sqrt::u64_sqrt_random 3751.04/iter +/- 79.66
num::int_sqrt::u64_sqrt_random_small 498.58/iter +/- 3.60
num::int_sqrt::u8_sqrt_predictable 329.07/iter +/- 12.64
num::int_sqrt::u8_sqrt_random 576.37/iter +/- 47.12
num::int_sqrt::u8_sqrt_random_small 589.01/iter +/- 22.53
When I first made this pull request
benchmarks:
num::int_sqrt::u128_sqrt_predictable 30728.33/iter +/- 1093.31
num::int_sqrt::u128_sqrt_random 4392.30/iter +/- 149.35
num::int_sqrt::u128_sqrt_random_small 323.58/iter +/- 7.26
num::int_sqrt::u16_sqrt_predictable 304.50/iter +/- 10.60
num::int_sqrt::u16_sqrt_random 334.78/iter +/- 18.02
num::int_sqrt::u16_sqrt_random_small 139.06/iter +/- 2.50
num::int_sqrt::u32_sqrt_predictable 1585.35/iter +/- 59.03
num::int_sqrt::u32_sqrt_random 972.40/iter +/- 38.47
num::int_sqrt::u32_sqrt_random_small 149.02/iter +/- 4.50
num::int_sqrt::u64_sqrt_predictable 5641.28/iter +/- 136.16
num::int_sqrt::u64_sqrt_random 1703.97/iter +/- 43.75
num::int_sqrt::u64_sqrt_random_small 197.42/iter +/- 4.03
num::int_sqrt::u8_sqrt_predictable 42.90/iter +/- 1.39
num::int_sqrt::u8_sqrt_random 133.55/iter +/- 3.92
num::int_sqrt::u8_sqrt_random_small 133.85/iter +/- 2.36
With the changes I'm going to push soon
benchmarks:
num::int_sqrt::u128_sqrt_predictable 26622.63/iter +/- 386.54
num::int_sqrt::u128_sqrt_random 3584.59/iter +/- 131.42
num::int_sqrt::u128_sqrt_random_small 904.23/iter +/- 22.04
num::int_sqrt::u16_sqrt_predictable 388.01/iter +/- 7.85
num::int_sqrt::u16_sqrt_random 480.00/iter +/- 16.07
num::int_sqrt::u16_sqrt_random_small 332.05/iter +/- 7.46
num::int_sqrt::u32_sqrt_predictable 1317.23/iter +/- 25.71
num::int_sqrt::u32_sqrt_random 783.17/iter +/- 8.53
num::int_sqrt::u32_sqrt_random_small 415.10/iter +/- 14.06
num::int_sqrt::u64_sqrt_predictable 4563.42/iter +/- 129.48
num::int_sqrt::u64_sqrt_random 1378.32/iter +/- 60.46
num::int_sqrt::u64_sqrt_random_small 658.41/iter +/- 8.93
num::int_sqrt::u8_sqrt_predictable 42.90/iter +/- 0.82
num::int_sqrt::u8_sqrt_random 132.98/iter +/- 1.53
num::int_sqrt::u8_sqrt_random_small 132.94/iter +/- 2.94
There's a tradeoff here. In the push I'm going to make soon, randomly chosen inputs from the whole input type range get a decent speedup from the initial pull request's code, but randomly chosen small inputs get about 3 times slower. Is that a good tradeoff? Is there another benchmark that should be added?
OK, I've pushed my latest changes.
Please ignore commits 31577f5 and 82389f4. Apparently HEAD was detached on my machine.
With a uniform distribution, we get these results:
Uniform distribution benchmarks
original_u8 time: [9.7782 ns 9.8095 ns 9.8501 ns]
karatsuba_u8 time: [5.2377 ns 5.2448 ns 5.2531 ns]
karatsuba_2_u8 time: [5.2304 ns 5.2347 ns 5.2399 ns]
original_u16 time: [13.415 ns 13.437 ns 13.465 ns]
karatsuba_u16 time: [6.7425 ns 6.7487 ns 6.7568 ns]
karatsuba_2_u16 time: [6.3988 ns 6.4035 ns 6.4094 ns]
original_u32 time: [19.016 ns 19.044 ns 19.078 ns]
karatsuba_u32 time: [10.162 ns 10.182 ns 10.211 ns]
karatsuba_2_u32 time: [8.8756 ns 8.8866 ns 8.8993 ns]
original_u64 time: [44.361 ns 44.413 ns 44.480 ns]
karatsuba_u64 time: [19.043 ns 19.144 ns 19.276 ns]
karatsuba_2_u64 time: [16.330 ns 16.359 ns 16.392 ns]
original_u128 time: [147.14 ns 147.49 ns 147.88 ns]
karatsuba_u128 time: [51.335 ns 51.455 ns 51.602 ns]
karatsuba_2_u128 time: [45.447 ns 45.507 ns 45.575 ns]
Thanks for the changes, this looks a lot tidier with the deduplication. I left a handful of new comments but those are just small things, nothing major.
I'm not sure whether you had been waiting on a review, but comment @rustbot ready whenever you are done. This will need a squash before merge but no need to do that until it is final.
There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged.
You can start a rebase with the following commands:
$ # rebase
$ git pull --rebase https://github.com/rust-lang/rust.git master
$ git push --force-with-lease
The following commits are merge commits:
- d61b4ddb16cf89aedfefa546f0566cc8c4041875
With regard to the squash, would it be alright to have two final commits, one for the testing and benchmarking and the next for the implementation? This would allow the benchmarking at least to be run on the prior implementation for comparison.
@rustbot ready
Aligning the lookup table to cache lines speeds up u8 on Apple M1:
Apple M1 benchmarks
Before cache line alignment:benchmarks:
num::int_sqrt::u128_sqrt_predictable 25464.64/iter +/- 1273.90
num::int_sqrt::u128_sqrt_random 3432.59/iter +/- 25.72
num::int_sqrt::u128_sqrt_random_small 808.12/iter +/- 17.95
num::int_sqrt::u128_sqrt_uniform 5913.12/iter +/- 373.72
num::int_sqrt::u16_sqrt_predictable 422.50/iter +/- 24.88
num::int_sqrt::u16_sqrt_random 521.85/iter +/- 20.16
num::int_sqrt::u16_sqrt_random_small 412.85/iter +/- 10.96
num::int_sqrt::u16_sqrt_uniform 618.18/iter +/- 21.46
num::int_sqrt::u32_sqrt_predictable 1267.82/iter +/- 40.84
num::int_sqrt::u32_sqrt_random 735.54/iter +/- 15.50
num::int_sqrt::u32_sqrt_random_small 413.31/iter +/- 7.09
num::int_sqrt::u32_sqrt_uniform 1043.55/iter +/- 21.41
num::int_sqrt::u64_sqrt_predictable 4151.74/iter +/- 109.88
num::int_sqrt::u64_sqrt_random 1276.77/iter +/- 27.18
num::int_sqrt::u64_sqrt_random_small 575.61/iter +/- 12.93
num::int_sqrt::u64_sqrt_uniform 1679.02/iter +/- 40.40
num::int_sqrt::u8_sqrt_predictable 42.26/iter +/- 0.88
num::int_sqrt::u8_sqrt_random 170.62/iter +/- 4.95
num::int_sqrt::u8_sqrt_random_small 170.88/iter +/- 3.79
num::int_sqrt::u8_sqrt_uniform 170.88/iter +/- 2.93
After cache line alignment:
benchmarks:
num::int_sqrt::u128_sqrt_predictable 25146.56/iter +/- 145.08
num::int_sqrt::u128_sqrt_random 3433.57/iter +/- 26.17
num::int_sqrt::u128_sqrt_random_small 818.61/iter +/- 53.39
num::int_sqrt::u128_sqrt_uniform 5891.31/iter +/- 147.90
num::int_sqrt::u16_sqrt_predictable 411.59/iter +/- 11.32
num::int_sqrt::u16_sqrt_random 527.51/iter +/- 25.60
num::int_sqrt::u16_sqrt_random_small 406.55/iter +/- 7.55
num::int_sqrt::u16_sqrt_uniform 617.40/iter +/- 19.75
num::int_sqrt::u32_sqrt_predictable 1255.94/iter +/- 21.66
num::int_sqrt::u32_sqrt_random 738.56/iter +/- 17.08
num::int_sqrt::u32_sqrt_random_small 409.39/iter +/- 8.64
num::int_sqrt::u32_sqrt_uniform 1040.39/iter +/- 21.58
num::int_sqrt::u64_sqrt_predictable 4144.49/iter +/- 264.87
num::int_sqrt::u64_sqrt_random 1280.68/iter +/- 31.78
num::int_sqrt::u64_sqrt_random_small 574.02/iter +/- 11.66
num::int_sqrt::u64_sqrt_uniform 1677.55/iter +/- 16.71
num::int_sqrt::u8_sqrt_predictable 42.19/iter +/- 0.80
num::int_sqrt::u8_sqrt_random 130.60/iter +/- 2.61
num::int_sqrt::u8_sqrt_random_small 130.79/iter +/- 2.30
num::int_sqrt::u8_sqrt_uniform 131.69/iter +/- 2.58
It has no effect on Ryzen 9 5900X:
Ryzen 9 5900X benchmarks
Before cache line alignment:
benchmarks:
num::int_sqrt::u128_sqrt_predictable 19691.09/iter +/- 741.04
num::int_sqrt::u128_sqrt_random 2894.61/iter +/- 38.25
num::int_sqrt::u128_sqrt_random_small 1154.15/iter +/- 13.97
num::int_sqrt::u128_sqrt_uniform 4114.75/iter +/- 239.00
num::int_sqrt::u16_sqrt_predictable 536.08/iter +/- 24.90
num::int_sqrt::u16_sqrt_random 593.93/iter +/- 14.95
num::int_sqrt::u16_sqrt_random_small 503.55/iter +/- 8.22
num::int_sqrt::u16_sqrt_uniform 844.66/iter +/- 6.84
num::int_sqrt::u32_sqrt_predictable 1623.75/iter +/- 25.16
num::int_sqrt::u32_sqrt_random 870.58/iter +/- 8.56
num::int_sqrt::u32_sqrt_random_small 534.55/iter +/- 22.62
num::int_sqrt::u32_sqrt_uniform 1344.67/iter +/- 26.97
num::int_sqrt::u64_sqrt_predictable 4681.89/iter +/- 173.77
num::int_sqrt::u64_sqrt_random 1392.21/iter +/- 51.08
num::int_sqrt::u64_sqrt_random_small 748.14/iter +/- 7.08
num::int_sqrt::u64_sqrt_uniform 1854.12/iter +/- 60.87
num::int_sqrt::u8_sqrt_predictable 43.19/iter +/- 0.97
num::int_sqrt::u8_sqrt_random 114.65/iter +/- 2.01
num::int_sqrt::u8_sqrt_random_small 113.95/iter +/- 1.03
num::int_sqrt::u8_sqrt_uniform 114.16/iter +/- 1.29
After cache line alignment:
benchmarks:
num::int_sqrt::u128_sqrt_predictable 19642.04/iter +/- 206.92
num::int_sqrt::u128_sqrt_random 2830.08/iter +/- 141.34
num::int_sqrt::u128_sqrt_random_small 1153.06/iter +/- 13.33
num::int_sqrt::u128_sqrt_uniform 4076.04/iter +/- 234.03
num::int_sqrt::u16_sqrt_predictable 526.21/iter +/- 28.71
num::int_sqrt::u16_sqrt_random 581.78/iter +/- 8.87
num::int_sqrt::u16_sqrt_random_small 494.27/iter +/- 5.85
num::int_sqrt::u16_sqrt_uniform 828.62/iter +/- 16.26
num::int_sqrt::u32_sqrt_predictable 1611.76/iter +/- 12.15
num::int_sqrt::u32_sqrt_random 892.10/iter +/- 6.39
num::int_sqrt::u32_sqrt_random_small 547.66/iter +/- 3.65
num::int_sqrt::u32_sqrt_uniform 1326.88/iter +/- 37.17
num::int_sqrt::u64_sqrt_predictable 4748.65/iter +/- 48.63
num::int_sqrt::u64_sqrt_random 1434.99/iter +/- 17.58
num::int_sqrt::u64_sqrt_random_small 766.33/iter +/- 9.79
num::int_sqrt::u64_sqrt_uniform 1902.92/iter +/- 16.71
num::int_sqrt::u8_sqrt_predictable 43.87/iter +/- 2.40
num::int_sqrt::u8_sqrt_random 116.94/iter +/- 3.94
num::int_sqrt::u8_sqrt_random_small 116.49/iter +/- 1.41
num::int_sqrt::u8_sqrt_uniform 116.40/iter +/- 0.80
@rustbot author
The job mingw-check failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
#16 2.759 Building wheels for collected packages: reuse
#16 2.760 Building wheel for reuse (pyproject.toml): started
#16 3.006 Building wheel for reuse (pyproject.toml): finished with status 'done'
#16 3.007 Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132715 sha256=dfa09868353292d98f811d3efdb0d54d07389e808efc71d68e3b93c514bf8bec
#16 3.008 Stored in directory: /tmp/pip-ephem-wheel-cache-zyqgdrpr/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#16 3.011 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#16 3.407 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#16 3.408 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#16 DONE 3.5s
---
Compiling std v0.0.0 (/checkout/library/std)
error: unused import: `hint`
--> core/src/num/nonzero.rs:10:18
|
10 | use crate::{fmt, hint, intrinsics, ptr, ub_checks};
|
= note: `-D unused-imports` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(unused_imports)]`
OK, I've force pushed the modified history.
Verification that file changes before history rewrite and after history rewrite are identical
Last commit before modifying history: 35fb40cace0b75860a97d9b71f27fdefcd51242b.
$ git diff 35fb40cace0b75860a97d9b71f27fdefcd51242b
$ git log -n 3
commit e1d06300d4e89ce6cd5ff9f92acc06cc49e37bd3 (HEAD -> isqrt)
Author: Chai T. Rex <[email protected]>
Date: Mon Aug 26 01:36:25 2024 -0400
Speed up `checked_isqrt` and `isqrt` methods
* Use a lookup table for 8-bit integers and the Karatsuba square root
algorithm for larger integers.
* Include optimization hints that give the compiler the exact numeric
range of results.
commit 6cd9bcdd815dd43256ca314b962788db639c5c6c
Author: Chai T. Rex <[email protected]>
Date: Mon Aug 26 01:34:25 2024 -0400
Improve `isqrt` tests and add benchmarks
* Choose test inputs more thoroughly and systematically.
* Check that `isqrt` and `checked_isqrt` have equivalent results for
signed types, either equivalent numerically or equivalent as a panic
and a `None`.
* Check that `isqrt` has numerically-equivalent results for unsigned
types and their `NonZero` counterparts.
* Reuse `ilog10` benchmarks, plus benchmarks that use a uniform
distribution.
commit 636d7ff91b9847d6d43c7bbe023568828f6e3246
Merge: 5601d142498 77303568c04
Author: bors <[email protected]>
Date: Mon Aug 19 20:42:45 2024 +0000
Auto merge of #129275 - matthiaskrgr:rollup-qv64hg6, r=matthiaskrgr
Rollup of 6 pull requests
Successful merges:
- #129194 (Fix bootstrap test `detect_src_and_out` on Windows)
- #129217 (safe transmute: check lifetimes)
- #129223 ( Fix wrong argument for `get_fn_decl`)
- #129235 (Check that `#[may_dangle]` is properly applied)
- #129245 (Fix a typo in `rustc_hir` doc comment)
- #129271 (Prevent double panic in query system, improve diagnostics)
r? `@ghost`
`@rustbot` modify labels: rollup
Verification that file changes before history rewrite and after history rewrite are identical
GitHub did that already :) from the compare link it gives your push https://github.com/rust-lang/rust/compare/35fb40cace0b75860a97d9b71f27fdefcd51242b..e1d06300d4e89ce6cd5ff9f92acc06cc49e37bd3 (works as long as the base doesn't change, otherwise it's terrible).
This looks awesome, thank you for putting all the effort in! Post benchmarks after the final changes if you get the chance.
@bors r+
:pushpin: Commit e1d06300d4e89ce6cd5ff9f92acc06cc49e37bd3 has been approved by tgross35
It is now in the queue for this repository.
This looks awesome, thank you for putting all the effort in!
Thank you for all the suggestions for improvements!
Post benchmarks after the final changes if you get the chance.
Apple M1 benchmarks
Before changes
benchmarks:
num::int_sqrt::u128_sqrt_predictable 114378.64/iter +/- 2533.28
num::int_sqrt::u128_sqrt_random 16750.05/iter +/- 737.74
num::int_sqrt::u128_sqrt_random_small 1258.67/iter +/- 41.06
num::int_sqrt::u128_sqrt_uniform 33110.56/iter +/- 1104.33
num::int_sqrt::u16_sqrt_predictable 1117.65/iter +/- 55.38
num::int_sqrt::u16_sqrt_random 1116.89/iter +/- 1201.61
num::int_sqrt::u16_sqrt_random_small 573.27/iter +/- 16.78
num::int_sqrt::u16_sqrt_uniform 2072.08/iter +/- 111.98
num::int_sqrt::u32_sqrt_predictable 3065.00/iter +/- 206.74
num::int_sqrt::u32_sqrt_random 1791.38/iter +/- 93.77
num::int_sqrt::u32_sqrt_random_small 461.37/iter +/- 24.58
num::int_sqrt::u32_sqrt_uniform 3788.21/iter +/- 267.84
num::int_sqrt::u64_sqrt_predictable 13204.51/iter +/- 97.78
num::int_sqrt::u64_sqrt_random 3728.35/iter +/- 169.88
num::int_sqrt::u64_sqrt_random_small 478.55/iter +/- 17.05
num::int_sqrt::u64_sqrt_uniform 9146.69/iter +/- 156.14
num::int_sqrt::u8_sqrt_predictable 333.17/iter +/- 2.83
num::int_sqrt::u8_sqrt_random 567.97/iter +/- 5.24
num::int_sqrt::u8_sqrt_random_small 576.73/iter +/- 14.10
num::int_sqrt::u8_sqrt_uniform 952.84/iter +/- 33.36
After changes
benchmarks:
num::int_sqrt::u128_sqrt_predictable 24316.02/iter +/- 1175.84
num::int_sqrt::u128_sqrt_random 3290.37/iter +/- 57.26
num::int_sqrt::u128_sqrt_random_small 227.77/iter +/- 19.86
num::int_sqrt::u128_sqrt_uniform 5799.31/iter +/- 171.87
num::int_sqrt::u16_sqrt_predictable 261.07/iter +/- 6.76
num::int_sqrt::u16_sqrt_random 288.43/iter +/- 5.81
num::int_sqrt::u16_sqrt_random_small 136.94/iter +/- 6.20
num::int_sqrt::u16_sqrt_uniform 556.81/iter +/- 20.06
num::int_sqrt::u32_sqrt_predictable 1153.25/iter +/- 73.52
num::int_sqrt::u32_sqrt_random 701.98/iter +/- 31.71
num::int_sqrt::u32_sqrt_random_small 159.10/iter +/- 4.83
num::int_sqrt::u32_sqrt_uniform 1055.76/iter +/- 18.27
num::int_sqrt::u64_sqrt_predictable 3979.86/iter +/- 52.21
num::int_sqrt::u64_sqrt_random 1178.71/iter +/- 16.93
num::int_sqrt::u64_sqrt_random_small 250.95/iter +/- 7.06
num::int_sqrt::u64_sqrt_uniform 1740.82/iter +/- 39.35
num::int_sqrt::u8_sqrt_predictable 42.29/iter +/- 1.12
num::int_sqrt::u8_sqrt_random 132.03/iter +/- 4.45
num::int_sqrt::u8_sqrt_random_small 132.94/iter +/- 5.08
num::int_sqrt::u8_sqrt_uniform 131.97/iter +/- 3.98
AMD Ryzen 9 5900X benchmarks
Before changes
benchmarks:
num::int_sqrt::u128_sqrt_predictable 107571.01/iter +/- 3586.42
num::int_sqrt::u128_sqrt_random 14957.03/iter +/- 171.20
num::int_sqrt::u128_sqrt_random_small 1631.80/iter +/- 15.79
num::int_sqrt::u128_sqrt_uniform 29913.22/iter +/- 491.33
num::int_sqrt::u16_sqrt_predictable 1028.91/iter +/- 6.69
num::int_sqrt::u16_sqrt_random 952.49/iter +/- 5.02
num::int_sqrt::u16_sqrt_random_small 571.79/iter +/- 6.40
num::int_sqrt::u16_sqrt_uniform 1698.65/iter +/- 23.04
num::int_sqrt::u32_sqrt_predictable 2795.33/iter +/- 12.68
num::int_sqrt::u32_sqrt_random 1573.76/iter +/- 8.02
num::int_sqrt::u32_sqrt_random_small 543.35/iter +/- 3.44
num::int_sqrt::u32_sqrt_uniform 2963.37/iter +/- 13.95
num::int_sqrt::u64_sqrt_predictable 10291.50/iter +/- 135.62
num::int_sqrt::u64_sqrt_random 3012.46/iter +/- 42.96
num::int_sqrt::u64_sqrt_random_small 550.93/iter +/- 5.55
num::int_sqrt::u64_sqrt_uniform 6346.38/iter +/- 290.38
num::int_sqrt::u8_sqrt_predictable 343.44/iter +/- 9.28
num::int_sqrt::u8_sqrt_random 567.52/iter +/- 3.47
num::int_sqrt::u8_sqrt_random_small 567.99/iter +/- 12.27
num::int_sqrt::u8_sqrt_uniform 922.86/iter +/- 18.49
After changes
benchmarks:
num::int_sqrt::u128_sqrt_predictable 17672.56/iter +/- 233.18
num::int_sqrt::u128_sqrt_random 2762.53/iter +/- 127.37
num::int_sqrt::u128_sqrt_random_small 225.04/iter +/- 10.44
num::int_sqrt::u128_sqrt_uniform 3889.07/iter +/- 52.62
num::int_sqrt::u16_sqrt_predictable 373.94/iter +/- 4.32
num::int_sqrt::u16_sqrt_random 355.54/iter +/- 3.56
num::int_sqrt::u16_sqrt_random_small 115.20/iter +/- 1.77
num::int_sqrt::u16_sqrt_uniform 709.20/iter +/- 13.64
num::int_sqrt::u32_sqrt_predictable 1308.96/iter +/- 13.35
num::int_sqrt::u32_sqrt_random 823.75/iter +/- 15.02
num::int_sqrt::u32_sqrt_random_small 222.49/iter +/- 52.82
num::int_sqrt::u32_sqrt_uniform 1253.50/iter +/- 30.43
num::int_sqrt::u64_sqrt_predictable 4170.12/iter +/- 77.05
num::int_sqrt::u64_sqrt_random 1312.70/iter +/- 26.43
num::int_sqrt::u64_sqrt_random_small 278.75/iter +/- 4.11
num::int_sqrt::u64_sqrt_uniform 1813.38/iter +/- 27.48
num::int_sqrt::u8_sqrt_predictable 43.85/iter +/- 1.51
num::int_sqrt::u8_sqrt_random 110.46/iter +/- 0.90
num::int_sqrt::u8_sqrt_random_small 110.51/iter +/- 1.15
num::int_sqrt::u8_sqrt_uniform 110.56/iter +/- 0.96
There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged.
You can start a rebase with the following commands:
$ # rebase
$ git pull --rebase https://github.com/rust-lang/rust.git master
$ git push --force-with-lease
The following commits are merge commits:
- a1d27b868ca651fd0295e63c6c66fb69f1f3543b
Resolved merge conflict
@rustbot ready
Thanks!
@bors r+
:pushpin: Commit 8e54e871375616606771e4efec2d73409b3d70ce has been approved by tgross35
It is now in the queue for this repository.
Pretty sure this nailed this rollup: https://github.com/rust-lang/rust/pull/129705#issuecomment-2316373910 @bors r-
That is pretty weird, none of this stuff should be platform-dependent unless llvm is doing something wrong. I don't think anything else there could have caused it, but let's get results here just to be sure.
@bors try
@ChaiTRex you'll probably have to debug this locally. Or maybe just remove all the unsafe asserts and change debug_assert to assert and we can try again, to get a hint if that is where the problem is coming from.
:hourglass: Trying commit 8e54e871375616606771e4efec2d73409b3d70ce with merge 99b5a9b7a8b9f56c6cd7178bab7d45ee537cdc5d...
That is pretty weird, none of this stuff should be platform-dependent unless llvm is doing something wrong. I don't think anything else there could have caused it, but let's get results here just to be sure.
My guess is that Wasm doesn't support panic unwinding, so when I use catch_unwind in the tests to ensure that negative inputs panic, Wasm doesn't like that.
Someone suggested #[cfg(panic = "unwind")] for that part, so I'll work on that.
That makes more sense. I'll retry after PR check finishes.
The job test-various failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
file:.git/config remote.origin.url=https://github.com/rust-lang-ci/rust
file:.git/config remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
file:.git/config gc.auto=0
file:.git/config http.https://github.com/.extraheader=AUTHORIZATION: basic ***
file:.git/config branch.try.remote=origin
file:.git/config branch.try.merge=refs/heads/try
file:.git/config submodule.library/backtrace.url=https://github.com/rust-lang/backtrace-rs.git
file:.git/config submodule.library/stdarch.active=true
file:.git/config submodule.library/stdarch.url=https://github.com/rust-lang/stdarch.git
file:.git/config submodule.src/doc/book.active=true
---
test num::int_log::ilog10_of_0_panic ... ignored
Error: failed to run main module `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-std/wasm32-wasip1/release/deps/coretests-725e4eb03cd34f4a.wasm`
Caused by:
0: failed to invoke command default
1: error while executing at wasm backtrace:
0: 0x1889d5 - coretests-725e4eb03cd34f4a.wasm!__rust_start_panic
1: 0x1884a7 - coretests-725e4eb03cd34f4a.wasm!rust_panic
2: 0x188306 - coretests-725e4eb03cd34f4a.wasm!std::panicking::rust_panic_with_hook::h4b7cdc6a86811ab2
3: 0x187153 - coretests-725e4eb03cd34f4a.wasm!std::panicking::begin_panic_handler::{{closure}}::h5d51873c650ae66a
4: 0x187092 - coretests-725e4eb03cd34f4a.wasm!std::sys::backtrace::__rust_end_short_backtrace::h99bf0f7bbe9f5e25
5: 0x187c42 - coretests-725e4eb03cd34f4a.wasm!rust_begin_unwind
6: 0x195f5d - coretests-725e4eb03cd34f4a.wasm!core::panicking::panic_fmt::hf9f7de3dbe87d903
7: 0x195fa1 - coretests-725e4eb03cd34f4a.wasm!core::num::int_sqrt::panic_for_negative_argument::hc1127e51220785cc
8: 0x2073 - coretests-725e4eb03cd34f4a.wasm!std::panicking::try::do_call::h8729d35fde7e957c
9: 0xc2cdd - coretests-725e4eb03cd34f4a.wasm!core::ops::function::FnOnce::call_once::hae34aa187e462869
10: 0x179318 - coretests-725e4eb03cd34f4a.wasm!test::__rust_begin_short_backtrace::hed313d8f52171515
11: 0x17917c - coretests-725e4eb03cd34f4a.wasm!test::types::RunnableTest::run::h7997ca9580bbaeb7
12: 0x163115 - coretests-725e4eb03cd34f4a.wasm!test::run_test::h08c5263a618aa01e
13: 0x15aec0 - coretests-725e4eb03cd34f4a.wasm!test::console::run_tests_console::h092bd764fe45d0b9
14: 0x1794b6 - coretests-725e4eb03cd34f4a.wasm!test::test_main::h1b006f7d4c4b791a
15: 0x17a16c - coretests-725e4eb03cd34f4a.wasm!test::test_main_static::h28974b6ae93adab2
16: 0x1433f6 - coretests-725e4eb03cd34f4a.wasm!coretests::main::h5f5ac5fe77cac24f
17: 0x1e55 - coretests-725e4eb03cd34f4a.wasm!std::sys::backtrace::__rust_begin_short_backtrace::h38b211e3e9744845
18: 0x1e48 - coretests-725e4eb03cd34f4a.wasm!std::rt::lang_start::{{closure}}::h299149183f8c9350
19: 0x181bbd - coretests-725e4eb03cd34f4a.wasm!std::rt::lang_start_internal::h15e4ecb84bb7fefe
20: 0x14342e - coretests-725e4eb03cd34f4a.wasm!__main_void
21: 0x1812 - coretests-725e4eb03cd34f4a.wasm!_start
note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable may show more debugging information
2: wasm trap: wasm `unreachable` instruction executed
test num::int_log::ilog10_u16 ... ok
test num::int_log::ilog10_u32 ... ok
test num::int_log::ilog10_u64 ... ok
test num::int_log::ilog10_u8 ... ok
test num::int_log::ilog10_u8 ... ok
test num::int_log::ilog1_of_1_panic ... ignored
test num::int_log::ilog2_of_0_panic ... ignored
test num::int_log::ilog3_of_0_panic ... ignored
error: test failed, to rerun pass `-p core --test coretests`
Caused by:
process didn't exit successfully: `/wasmtime-v19.0.0-x86_64-linux/wasmtime run -C cache=n --dir . --env RUSTC_BOOTSTRAP /checkout/obj/build/x86_64-unknown-linux-gnu/stage1-std/wasm32-wasip1/release/deps/coretests-725e4eb03cd34f4a.wasm -Z unstable-options --format json` (exit status: 134)
note: test exited abnormally; to see the full output pass --nocapture to the harness.
local time: Thu Aug 29 03:39:58 UTC 2024
network time: Thu, 29 Aug 2024 03:39:58 GMT
##[error]Process completed with exit code 1.
Post job cleanup.