num-bigint icon indicating copy to clipboard operation
num-bigint copied to clipboard

Base on SmallVec instead of Vec

Open kvark opened this issue 6 years ago • 19 comments

It would be great to allow for BigInt and alike to be used in constant expressions. This is not possible with Vec, but if we use SmallVec instead, we might be able to make this possible. See https://github.com/servo/rust-smallvec/issues/86

kvark avatar Mar 19 '18 16:03 kvark

AFAICT, smallvec only tests with the latest stable rustc, which is not a great fit for num-bigint since we hold a stronger guarantees about rustc requirements -- currently targeting 1.15 for the development on num-bigint 0.2.

It might be feasible to implement our own internal SmallVec though, since we'd only need a smaller subset of the normal Vec-like API. It will need to be thoroughly benchmarked -- avoiding the heap is a benefit, but the added conditional branches could hurt. It's not an obvious win.

On constants -- I doubt we'll change the traits as you're proposing in rust-num/num-traits#54, but at least with a small-vec implementation, it should be possible to add such constants on the BigUint/BigInt types themselves.

cuviper avatar Mar 19 '18 17:03 cuviper

smallvec only tests with the latest stable rustc, which is not a great fit for num-bigint since we hold a stronger guarantees about rustc requirements -- currently targeting 1.15 for the development on num-bigint 0.2.

True, but we can easily fix that. SmallVec belongs to Servo org, which I'm a member of. I'm sure we can figure this out ;)

It might be feasible to implement our own internal SmallVec though, since we'd only need a smaller subset of the normal Vec-like API. It will need to be thoroughly benchmarked -- avoiding the heap is a benefit, but the added conditional branches could hurt. It's not an obvious win.

I see. Alternative approach would be to have some sort of a generic parameter for allocation of BigInt, making sure that instances created from different allocations can still inter-operate. It would basically mean lifting the mentioned conditional branch into the type system and transitioning the cost from run-time to compile time (plus, a bit of the API complexity).

kvark avatar Mar 19 '18 19:03 kvark

I noticed that this is marked "help wanted." I might be interested in taking it on. If so I would imagine implementing a SmallVec version of one of the BigInts first, as a relatively easy exploratory option (and benchmarking) and then taking some course of action based on what we learn from that and whether or not SmallVec will adapt to meet num-bigint's testing requirements, etc.

If that sounds reasonable, I'll get started. I have a few hours free each week so it might take several weekly iterations.

itsybitesyspider avatar Jun 02 '18 01:06 itsybitesyspider

@clanehin: Sure, if you're interested to explore this, that would be great!

cuviper avatar Jun 02 '18 04:06 cuviper

Just updating. I did some work into this and plan to continue. You're probably anticipating this, but it seems like the change would necessarily be visible to external users of the API.

itsybitesyspider avatar Jun 09 '18 01:06 itsybitesyspider

I don't expect public API changes. What are you proposing?

cuviper avatar Jun 09 '18 06:06 cuviper

What are you proposing?

Nothing, if it can be helped, but the implementation leaks slightly. For example, unless I'm about to learn something really fascinating about rust (which I very well might be), returning &mut Vec<BigDigit> in this trait won't work:

pub trait IntDigits {
    fn digits(&self) -> &[BigDigit];
    fn digits_mut(&mut self) -> &mut Vec<BigDigit>;
    fn normalize(&mut self);
    fn capacity(&self) -> usize;
    fn len(&self) -> usize;
}

It may also change what will be considered idiomatic, such as BigInt::new() taking ownership of a Vec.

itsybitesyspider avatar Jun 13 '18 00:06 itsybitesyspider

IntDigits is not actually public, since it's in a private module. It can change however we like.

BigInt::new() is annoying yes, but that's also a problem if we ever want to change to a different digit size. I think we just have to accept that it may not always use that direct buffer as it implies.

cuviper avatar Jun 13 '18 00:06 cuviper

any progress?

programmerjake avatar Aug 10 '19 21:08 programmerjake

I messed around with this on a branch in my fork of this repo. A naive implementation passed all tests, but was slightly slower for the (rather limited) benches we have.

SmallVec
test factorial_div_biguint ... bench:   1,146,430 ns/iter (+/- 3,635)
test factorial_div_u32     ... bench:     837,139 ns/iter (+/- 3,772)
test factorial_mul_biguint ... bench:     176,384 ns/iter (+/- 4,271)
test factorial_mul_u32     ... bench:      84,485 ns/iter (+/- 724)

Vec
test factorial_div_biguint ... bench:     895,277 ns/iter (+/- 37,985)
test factorial_div_u32     ... bench:     825,572 ns/iter (+/- 5,411)
test factorial_mul_biguint ... bench:     186,093 ns/iter (+/- 1,880)
test factorial_mul_u32     ... bench:      78,077 ns/iter (+/- 975)

maxbla avatar Nov 15 '19 18:11 maxbla

Note that, while a slight performance hit would be incurred by the extra logic of SmallVec, there is probably gonna be a larger gain when using biguints in a large application, because SmallVec can avoid a lot of cache misses when the digits are stored inline. (That said, I looked into your implementation, and making the SmallVecArray have 32 digits seems excessive; I'd have gone with (std::mem::size_of::<usize>() * 2) / std::mem::size_of::<BigDigit>).

carado avatar Jul 01 '21 12:07 carado

I tried making my own fork and I'm getting similar to slightly better benchmark results to the original code; though note that I'm using smallvec with features union and specialization enabled to help performance.

SmallVec
test factorial_div_biguint ... bench:     834,889 ns/iter (+/- 29,744)
test factorial_div_u32     ... bench:     823,152 ns/iter (+/- 25,495)
test factorial_mul_biguint ... bench:      47,004 ns/iter (+/- 1,126)
test factorial_mul_u32     ... bench:      42,630 ns/iter (+/- 2,734)

Vec
test factorial_div_biguint ... bench:     858,601 ns/iter (+/- 11,518)
test factorial_div_u32     ... bench:     831,284 ns/iter (+/- 16,941)
test factorial_mul_biguint ... bench:      64,631 ns/iter (+/- 4,383)
test factorial_mul_u32     ... bench:      42,382 ns/iter (+/- 957)

But, again, the gains from cache locality in large applications could be a lot more significant than what simple benchmarks can test (or maybe there are such benchmarks in num-bigint; I'm really just assuming).

carado avatar Jul 01 '21 13:07 carado

Specialization won't be acceptable here, as it requires a nightly compiler, but could you benchmark with and without union?

There are more benchmarks in the project if you enable the "rand" feature. Still it would be great to add better benchmarks, or to gather performance numbers from use in a real project!

cuviper avatar Jul 01 '21 20:07 cuviper

Oh, I didn't think of testing and benching with features enabled. Here's the results for cargo bench --all-features. (Maybe a nightly feature allowing users to enable specialization on smallvec when they use nightly would be nice?)

Normal Vec implementation:

     Running unittests (target/release/deps/bigint-79048b53fbe03cd6)

running 53 tests
test divide_0             ... bench:          88 ns/iter (+/- 3)
test divide_1             ... bench:       1,494 ns/iter (+/- 115)
test divide_2             ... bench:      99,967 ns/iter (+/- 4,453)
test divide_big_little    ... bench:      15,186 ns/iter (+/- 549)
test fac_to_string        ... bench:         862 ns/iter (+/- 43)
test factorial_100        ... bench:       3,123 ns/iter (+/- 105)
test fib2_100             ... bench:       1,603 ns/iter (+/- 79)
test fib2_1000            ... bench:      17,495 ns/iter (+/- 808)
test fib2_10000           ... bench:     576,920 ns/iter (+/- 31,803)
test fib_100              ... bench:         756 ns/iter (+/- 201)
test fib_1000             ... bench:       8,719 ns/iter (+/- 906)
test fib_10000            ... bench:     255,007 ns/iter (+/- 8,356)
test fib_to_string        ... bench:         126 ns/iter (+/- 5)
test from_str_radix_02    ... bench:       2,630 ns/iter (+/- 103)
test from_str_radix_08    ... bench:         977 ns/iter (+/- 59)
test from_str_radix_10    ... bench:       1,016 ns/iter (+/- 29)
test from_str_radix_16    ... bench:       1,146 ns/iter (+/- 64)
test from_str_radix_36    ... bench:       1,010 ns/iter (+/- 39)
test hash                 ... bench:      67,447 ns/iter (+/- 2,318)
test iter_u32_digits      ... bench:          63 ns/iter (+/- 1)
test iter_u64_digits      ... bench:          40 ns/iter (+/- 12)
test modpow               ... bench:   5,006,623 ns/iter (+/- 252,751)
test modpow_even          ... bench:   9,417,018 ns/iter (+/- 295,978)
test multiply_0           ... bench:          54 ns/iter (+/- 6)
test multiply_1           ... bench:       4,142 ns/iter (+/- 152)
test multiply_2           ... bench:     252,070 ns/iter (+/- 12,960)
test multiply_3           ... bench:     567,744 ns/iter (+/- 67,021)
test pow_bench            ... bench:   2,869,891 ns/iter (+/- 280,750)
test pow_bench_1e1000     ... bench:       1,358 ns/iter (+/- 152)
test pow_bench_1e10000    ... bench:      42,472 ns/iter (+/- 1,896)
test pow_bench_1e100000   ... bench:   1,291,335 ns/iter (+/- 69,623)
test pow_bench_bigexp     ... bench:   2,842,016 ns/iter (+/- 151,826)
test rand_1009            ... bench:         102 ns/iter (+/- 14)
test rand_131072          ... bench:       5,048 ns/iter (+/- 928)
test rand_2048            ... bench:         126 ns/iter (+/- 2)
test rand_256             ... bench:          54 ns/iter (+/- 1)
test rand_4096            ... bench:         214 ns/iter (+/- 6)
test rand_64              ... bench:          41 ns/iter (+/- 1)
test rand_65536           ... bench:       2,564 ns/iter (+/- 384)
test rand_8192            ... bench:         372 ns/iter (+/- 14)
test remainder_0          ... bench:          91 ns/iter (+/- 4)
test remainder_1          ... bench:       1,437 ns/iter (+/- 77)
test remainder_2          ... bench:      98,653 ns/iter (+/- 5,557)
test remainder_big_little ... bench:      15,081 ns/iter (+/- 498)
test shl                  ... bench:         837 ns/iter (+/- 40)
test shr                  ... bench:       1,285 ns/iter (+/- 32)
test to_str_radix_02      ... bench:       2,336 ns/iter (+/- 100)
test to_str_radix_08      ... bench:         851 ns/iter (+/- 11)
test to_str_radix_10      ... bench:       2,275 ns/iter (+/- 45)
test to_str_radix_16      ... bench:         639 ns/iter (+/- 15)
test to_str_radix_36      ... bench:       5,139 ns/iter (+/- 101)
test to_u32_digits        ... bench:          98 ns/iter (+/- 2)
test to_u64_digits        ... bench:          41 ns/iter (+/- 3)

test result: ok. 0 passed; 0 failed; 0 ignored; 53 measured; 0 filtered out; finished in 141.49s

     Running unittests (target/release/deps/factorial-8a44d63636a4081e)

running 4 tests
test factorial_div_biguint ... bench:     858,930 ns/iter (+/- 23,678)
test factorial_div_u32     ... bench:     832,476 ns/iter (+/- 19,487)
test factorial_mul_biguint ... bench:      65,023 ns/iter (+/- 2,033)
test factorial_mul_u32     ... bench:      41,388 ns/iter (+/- 1,133)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out; finished in 6.31s

     Running unittests (target/release/deps/gcd-b60898ea20c35119)

running 8 tests
test gcd_euclid_0064 ... bench:       1,124 ns/iter (+/- 18)
test gcd_euclid_0256 ... bench:      18,498 ns/iter (+/- 257)
test gcd_euclid_1024 ... bench:     109,739 ns/iter (+/- 1,600)
test gcd_euclid_4096 ... bench:     772,073 ns/iter (+/- 9,020)
test gcd_stein_0064  ... bench:       1,354 ns/iter (+/- 40)
test gcd_stein_0256  ... bench:       5,285 ns/iter (+/- 118)
test gcd_stein_1024  ... bench:      22,980 ns/iter (+/- 356)
test gcd_stein_4096  ... bench:     150,925 ns/iter (+/- 3,734)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 8.92s

     Running unittests (target/release/deps/roots-8feee7ad652de5d6)

running 21 tests
test big1k_cbrt      ... bench:       2,461 ns/iter (+/- 81)
test big1k_nth_10    ... bench:       2,290 ns/iter (+/- 88)
test big1k_nth_100   ... bench:         895 ns/iter (+/- 10)
test big1k_nth_1000  ... bench:       4,034 ns/iter (+/- 87)
test big1k_nth_10000 ... bench:          19 ns/iter (+/- 0)
test big1k_sqrt      ... bench:       1,753 ns/iter (+/- 66)
test big2k_cbrt      ... bench:       4,712 ns/iter (+/- 246)
test big2k_nth_10    ... bench:       4,246 ns/iter (+/- 125)
test big2k_nth_100   ... bench:       4,431 ns/iter (+/- 116)
test big2k_nth_1000  ... bench:       6,148 ns/iter (+/- 307)
test big2k_nth_10000 ... bench:          19 ns/iter (+/- 2)
test big2k_sqrt      ... bench:       4,291 ns/iter (+/- 169)
test big4k_cbrt      ... bench:      10,941 ns/iter (+/- 240)
test big4k_nth_10    ... bench:      11,923 ns/iter (+/- 393)
test big4k_nth_100   ... bench:      10,137 ns/iter (+/- 476)
test big4k_nth_1000  ... bench:      38,089 ns/iter (+/- 1,018)
test big4k_nth_10000 ... bench:          19 ns/iter (+/- 0)
test big4k_sqrt      ... bench:       8,525 ns/iter (+/- 346)
test big64_cbrt      ... bench:          57 ns/iter (+/- 1)
test big64_nth_10    ... bench:          63 ns/iter (+/- 2)
test big64_sqrt      ... bench:          30 ns/iter (+/- 1)

test result: ok. 0 passed; 0 failed; 0 ignored; 21 measured; 0 filtered out; finished in 18.99s

     Running unittests (target/release/deps/shootout_pidigits-eb03cce354523a01)

SmallVec with just union:

     Running unittests (target/release/deps/bigint-26abe6e85fc086fc)

running 53 tests
test divide_0             ... bench:          87 ns/iter (+/- 1)
test divide_1             ... bench:       1,628 ns/iter (+/- 60)
test divide_2             ... bench:     101,569 ns/iter (+/- 5,211)
test divide_big_little    ... bench:      15,712 ns/iter (+/- 716)
test fac_to_string        ... bench:         903 ns/iter (+/- 42)
test factorial_100        ... bench:         952 ns/iter (+/- 27)
test fib2_100             ... bench:       1,618 ns/iter (+/- 33)
test fib2_1000            ... bench:      14,994 ns/iter (+/- 209)
test fib2_10000           ... bench:     504,678 ns/iter (+/- 18,848)
test fib_100              ... bench:       1,157 ns/iter (+/- 16)
test fib_1000             ... bench:       9,805 ns/iter (+/- 102)
test fib_10000            ... bench:     250,294 ns/iter (+/- 22,379)
test fib_to_string        ... bench:         134 ns/iter (+/- 2)
test from_str_radix_02    ... bench:       2,580 ns/iter (+/- 53)
test from_str_radix_08    ... bench:         992 ns/iter (+/- 49)
test from_str_radix_10    ... bench:       1,046 ns/iter (+/- 76)
test from_str_radix_16    ... bench:       1,150 ns/iter (+/- 96)
test from_str_radix_36    ... bench:         991 ns/iter (+/- 13)
test hash                 ... bench:      67,101 ns/iter (+/- 2,042)
test iter_u32_digits      ... bench:          61 ns/iter (+/- 1)
test iter_u64_digits      ... bench:          47 ns/iter (+/- 1)
test modpow               ... bench:   5,164,328 ns/iter (+/- 115,802)
test modpow_even          ... bench:   9,540,707 ns/iter (+/- 441,803)
test multiply_0           ... bench:          60 ns/iter (+/- 1)
test multiply_1           ... bench:       4,088 ns/iter (+/- 105)
test multiply_2           ... bench:     263,439 ns/iter (+/- 8,241)
test multiply_3           ... bench:     602,739 ns/iter (+/- 8,603)
test pow_bench            ... bench:   2,266,011 ns/iter (+/- 38,379)
test pow_bench_1e1000     ... bench:       1,284 ns/iter (+/- 10)
test pow_bench_1e10000    ... bench:      44,116 ns/iter (+/- 377)
test pow_bench_1e100000   ... bench:   1,379,432 ns/iter (+/- 32,865)
test pow_bench_bigexp     ... bench:   2,343,012 ns/iter (+/- 169,785)
test rand_1009            ... bench:          95 ns/iter (+/- 3)
test rand_131072          ... bench:       4,863 ns/iter (+/- 60)
test rand_2048            ... bench:         124 ns/iter (+/- 5)
test rand_256             ... bench:          54 ns/iter (+/- 5)
test rand_4096            ... bench:         221 ns/iter (+/- 10)
test rand_64              ... bench:          38 ns/iter (+/- 3)
test rand_65536           ... bench:       2,471 ns/iter (+/- 21)
test rand_8192            ... bench:         348 ns/iter (+/- 5)
test remainder_0          ... bench:          82 ns/iter (+/- 1)
test remainder_1          ... bench:       1,526 ns/iter (+/- 102)
test remainder_2          ... bench:      99,220 ns/iter (+/- 5,314)
test remainder_big_little ... bench:      15,207 ns/iter (+/- 224)
test shl                  ... bench:       1,522 ns/iter (+/- 81)
test shr                  ... bench:       2,002 ns/iter (+/- 83)
test to_str_radix_02      ... bench:       2,255 ns/iter (+/- 70)
test to_str_radix_08      ... bench:         846 ns/iter (+/- 65)
test to_str_radix_10      ... bench:       2,417 ns/iter (+/- 133)
test to_str_radix_16      ... bench:         634 ns/iter (+/- 18)
test to_str_radix_36      ... bench:       4,870 ns/iter (+/- 165)
test to_u32_digits        ... bench:          98 ns/iter (+/- 4)
test to_u64_digits        ... bench:          40 ns/iter (+/- 2)

test result: ok. 0 passed; 0 failed; 0 ignored; 53 measured; 0 filtered out; finished in 60.33s

     Running unittests (target/release/deps/factorial-a0c5d54a68c2d509)

running 4 tests
test factorial_div_biguint ... bench:     834,046 ns/iter (+/- 11,679)
test factorial_div_u32     ... bench:     831,354 ns/iter (+/- 13,294)
test factorial_mul_biguint ... bench:      47,346 ns/iter (+/- 1,127)
test factorial_mul_u32     ... bench:      42,279 ns/iter (+/- 1,034)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out; finished in 4.29s

     Running unittests (target/release/deps/gcd-99b9c7321f43a153)

running 8 tests
test gcd_euclid_0064 ... bench:         664 ns/iter (+/- 14)
test gcd_euclid_0256 ... bench:      19,528 ns/iter (+/- 832)
test gcd_euclid_1024 ... bench:     110,668 ns/iter (+/- 1,670)
test gcd_euclid_4096 ... bench:     769,345 ns/iter (+/- 27,536)
test gcd_stein_0064  ... bench:       2,195 ns/iter (+/- 44)
test gcd_stein_0256  ... bench:       6,811 ns/iter (+/- 205)
test gcd_stein_1024  ... bench:      30,518 ns/iter (+/- 1,127)
test gcd_stein_4096  ... bench:     190,140 ns/iter (+/- 1,359)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 4.91s

     Running unittests (target/release/deps/roots-24bad08b25f9ae16)

running 21 tests
test big1k_cbrt      ... bench:       2,683 ns/iter (+/- 132)
test big1k_nth_10    ... bench:       2,586 ns/iter (+/- 210)
test big1k_nth_100   ... bench:       1,100 ns/iter (+/- 63)
test big1k_nth_1000  ... bench:       4,852 ns/iter (+/- 182)
test big1k_nth_10000 ... bench:          11 ns/iter (+/- 0)
test big1k_sqrt      ... bench:       1,945 ns/iter (+/- 126)
test big2k_cbrt      ... bench:       4,993 ns/iter (+/- 81)
test big2k_nth_10    ... bench:       4,971 ns/iter (+/- 410)
test big2k_nth_100   ... bench:       5,033 ns/iter (+/- 115)
test big2k_nth_1000  ... bench:       6,617 ns/iter (+/- 208)
test big2k_nth_10000 ... bench:          11 ns/iter (+/- 1)
test big2k_sqrt      ... bench:       4,774 ns/iter (+/- 170)
test big4k_cbrt      ... bench:      10,598 ns/iter (+/- 147)
test big4k_nth_10    ... bench:      12,714 ns/iter (+/- 197)
test big4k_nth_100   ... bench:      10,966 ns/iter (+/- 366)
test big4k_nth_1000  ... bench:      40,790 ns/iter (+/- 353)
test big4k_nth_10000 ... bench:          11 ns/iter (+/- 0)
test big4k_sqrt      ... bench:       8,919 ns/iter (+/- 66)
test big64_cbrt      ... bench:          31 ns/iter (+/- 0)
test big64_nth_10    ... bench:          38 ns/iter (+/- 0)
test big64_sqrt      ... bench:          13 ns/iter (+/- 1)

test result: ok. 0 passed; 0 failed; 0 ignored; 21 measured; 0 filtered out; finished in 18.69s

     Running unittests (target/release/deps/shootout_pidigits-2d80642dadfa5483)

SmallVec with no features:

     Running unittests (target/release/deps/bigint-91fb6a4c977331da)

running 53 tests
test divide_0             ... bench:          96 ns/iter (+/- 2)
test divide_1             ... bench:       1,711 ns/iter (+/- 68)
test divide_2             ... bench:     101,510 ns/iter (+/- 4,070)
test divide_big_little    ... bench:      15,283 ns/iter (+/- 433)
test fac_to_string        ... bench:         934 ns/iter (+/- 24)
test factorial_100        ... bench:       1,675 ns/iter (+/- 59)
test fib2_100             ... bench:       1,474 ns/iter (+/- 68)
test fib2_1000            ... bench:      14,773 ns/iter (+/- 374)
test fib2_10000           ... bench:     505,756 ns/iter (+/- 14,078)
test fib_100              ... bench:         900 ns/iter (+/- 27)
test fib_1000             ... bench:       9,203 ns/iter (+/- 241)
test fib_10000            ... bench:     267,626 ns/iter (+/- 31,114)
test fib_to_string        ... bench:         145 ns/iter (+/- 15)
test from_str_radix_02    ... bench:       2,561 ns/iter (+/- 101)
test from_str_radix_08    ... bench:       1,003 ns/iter (+/- 34)
test from_str_radix_10    ... bench:         977 ns/iter (+/- 49)
test from_str_radix_16    ... bench:       1,128 ns/iter (+/- 56)
test from_str_radix_36    ... bench:         980 ns/iter (+/- 78)
test hash                 ... bench:      66,736 ns/iter (+/- 4,335)
test iter_u32_digits      ... bench:          63 ns/iter (+/- 0)
test iter_u64_digits      ... bench:          46 ns/iter (+/- 5)
test modpow               ... bench:   5,095,925 ns/iter (+/- 130,135)
test modpow_even          ... bench:   9,875,323 ns/iter (+/- 260,819)
test multiply_0           ... bench:          52 ns/iter (+/- 1)
test multiply_1           ... bench:       4,123 ns/iter (+/- 147)
test multiply_2           ... bench:     261,917 ns/iter (+/- 8,251)
test multiply_3           ... bench:     613,228 ns/iter (+/- 18,154)
test pow_bench            ... bench:   2,279,671 ns/iter (+/- 64,637)
test pow_bench_1e1000     ... bench:       1,301 ns/iter (+/- 67)
test pow_bench_1e10000    ... bench:      46,547 ns/iter (+/- 1,225)
test pow_bench_1e100000   ... bench:   1,405,682 ns/iter (+/- 58,574)
test pow_bench_bigexp     ... bench:   2,348,941 ns/iter (+/- 69,577)
test rand_1009            ... bench:         103 ns/iter (+/- 3)
test rand_131072          ... bench:       5,490 ns/iter (+/- 1,021)
test rand_2048            ... bench:         147 ns/iter (+/- 5)
test rand_256             ... bench:          56 ns/iter (+/- 2)
test rand_4096            ... bench:         237 ns/iter (+/- 10)
test rand_64              ... bench:          47 ns/iter (+/- 2)
test rand_65536           ... bench:       2,797 ns/iter (+/- 47)
test rand_8192            ... bench:         400 ns/iter (+/- 13)
test remainder_0          ... bench:          90 ns/iter (+/- 4)
test remainder_1          ... bench:       1,633 ns/iter (+/- 118)
test remainder_2          ... bench:     102,421 ns/iter (+/- 2,044)
test remainder_big_little ... bench:      15,523 ns/iter (+/- 704)
test shl                  ... bench:       1,653 ns/iter (+/- 59)
test shr                  ... bench:       2,135 ns/iter (+/- 93)
test to_str_radix_02      ... bench:       2,273 ns/iter (+/- 86)
test to_str_radix_08      ... bench:         845 ns/iter (+/- 15)
test to_str_radix_10      ... bench:       2,484 ns/iter (+/- 76)
test to_str_radix_16      ... bench:         669 ns/iter (+/- 20)
test to_str_radix_36      ... bench:       4,960 ns/iter (+/- 82)
test to_u32_digits        ... bench:          93 ns/iter (+/- 2)
test to_u64_digits        ... bench:          41 ns/iter (+/- 1)

test result: ok. 0 passed; 0 failed; 0 ignored; 53 measured; 0 filtered out; finished in 76.23s

     Running unittests (target/release/deps/factorial-6e59df262cd67221)

running 4 tests
test factorial_div_biguint ... bench:     854,949 ns/iter (+/- 28,634)
test factorial_div_u32     ... bench:     833,860 ns/iter (+/- 20,438)
test factorial_mul_biguint ... bench:      48,744 ns/iter (+/- 1,457)
test factorial_mul_u32     ... bench:      43,355 ns/iter (+/- 1,926)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out; finished in 8.89s

     Running unittests (target/release/deps/gcd-493ca256b6210eab)

running 8 tests
test gcd_euclid_0064 ... bench:         762 ns/iter (+/- 13)
test gcd_euclid_0256 ... bench:      19,251 ns/iter (+/- 1,404)
test gcd_euclid_1024 ... bench:     110,670 ns/iter (+/- 4,248)
test gcd_euclid_4096 ... bench:     754,299 ns/iter (+/- 35,581)
test gcd_stein_0064  ... bench:       2,044 ns/iter (+/- 58)
test gcd_stein_0256  ... bench:       7,259 ns/iter (+/- 335)
test gcd_stein_1024  ... bench:      30,980 ns/iter (+/- 578)
test gcd_stein_4096  ... bench:     193,098 ns/iter (+/- 3,945)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 9.81s

     Running unittests (target/release/deps/roots-4a8363865441b0bc)

running 21 tests
test big1k_cbrt      ... bench:       2,664 ns/iter (+/- 40)
test big1k_nth_10    ... bench:       2,540 ns/iter (+/- 134)
test big1k_nth_100   ... bench:         960 ns/iter (+/- 19)
test big1k_nth_1000  ... bench:       4,690 ns/iter (+/- 93)
test big1k_nth_10000 ... bench:           7 ns/iter (+/- 0)
test big1k_sqrt      ... bench:       2,157 ns/iter (+/- 48)
test big2k_cbrt      ... bench:       5,102 ns/iter (+/- 50)
test big2k_nth_10    ... bench:       4,885 ns/iter (+/- 71)
test big2k_nth_100   ... bench:       4,483 ns/iter (+/- 91)
test big2k_nth_1000  ... bench:       6,459 ns/iter (+/- 70)
test big2k_nth_10000 ... bench:           7 ns/iter (+/- 0)
test big2k_sqrt      ... bench:       5,022 ns/iter (+/- 56)
test big4k_cbrt      ... bench:      11,060 ns/iter (+/- 118)
test big4k_nth_10    ... bench:      12,851 ns/iter (+/- 171)
test big4k_nth_100   ... bench:      10,636 ns/iter (+/- 208)
test big4k_nth_1000  ... bench:      40,405 ns/iter (+/- 428)
test big4k_nth_10000 ... bench:           7 ns/iter (+/- 0)
test big4k_sqrt      ... bench:       9,336 ns/iter (+/- 133)
test big64_cbrt      ... bench:          30 ns/iter (+/- 4)
test big64_nth_10    ... bench:          39 ns/iter (+/- 17)
test big64_sqrt      ... bench:          12 ns/iter (+/- 0)

test result: ok. 0 passed; 0 failed; 0 ignored; 21 measured; 0 filtered out; finished in 13.61s

     Running unittests (target/release/deps/shootout_pidigits-3a0f37269e8cd190)

carado avatar Jul 01 '21 22:07 carado

If BigInt was defined as an enum:

enum BigInt {
  Zero,
  Small(isize),
  Positive(BigUint),
  Negative(BigUint),
}

Then it would have the same size as before (4 words) but catch the common case of the number fitting in a single digit. This is how arbitrary precision integers are represented in Haskell. (Well, not quite but almost.)

lemmih avatar Jul 07 '21 13:07 lemmih

I have some "real-world" benchmark data, btw. I implemented an algorithm for generating random polygons. The code is polymorphing and each run is using the same test input. All of the input numbers fit in an isize. Here are the criterion results from my laptop (M1) and desktop (3950X):

M1 3950X
isize 2.1ms (1x) 2.6 ms (1x)
BigInt 56ms (26x) 39ms (15x)
BigRational 1.66s (783x) 1.63s (635x)

I don't know why BigRational is nearly three orders of magnitude slower than isize. My gut is telling me it should be possible to bring both BigInt and BigRational within one order of magnitude of the performance of isize.

lemmih avatar Jul 07 '21 15:07 lemmih

Extra details: The algorithm primarily does line intersection checks which involve two multiplications and four subtractions. The code for isize checks for overflows and handles them correctly. The benchmark code is here: https://github.com/rgeometry/rgeometry/blob/feature-faster-high-precision/benches/two_opt.rs

lemmih avatar Jul 07 '21 15:07 lemmih

I don't know why BigRational is nearly three orders of magnitude slower than isize. My gut is telling me it should be possible to bring both BigInt and BigRational within one order of magnitude of the performance of isize.

I'd guess BigRational is so slow because nearly every operation requires preforming a GCD or LCM operation to cancel common factors in the numerator/denominator.

programmerjake avatar Jul 07 '21 15:07 programmerjake

I don't know why BigRational is nearly three orders of magnitude slower than isize. My gut is telling me it should be possible to bring both BigInt and BigRational within one order of magnitude of the performance of isize.

I'd guess BigRational is so slow because nearly every operation requires preforming a GCD or LCM operation to cancel common factors in the numerator/denominator.

That makes sense. In my benchmark, the denominator is always 1, though, so not much work is actually required. I'll benchmark against rug and ramp to see how they compare on rationals.

lemmih avatar Jul 07 '21 16:07 lemmih