math
math copied to clipboard
Prime Number Functions
An initial addition of two prime number functions:
prime_sieve
is a linear prime sieve algorithm. Currently benchmarks O(n).
prime_range
returns all the prime numbers in the range [lower_bound, upper_bound]. For the first 1000 primes it can use the already built in lookup tables; outside of that it will call prime_sieve
. This is the function intended for end users to call.
Future additions would include a Miller-Rabin primality test.
FYI there's a Miller Rabin test in Multiprecision already.
@mborland : OMG thank you; we've needed a prime sieve for so long.
@mborland : I would definitely change the name from prime_functions.hpp
to prime_sieve.hpp
; the first is not very memorable.
@mborland : IIRC, the Joy of Factoring also creates bitsets which set prime at bit i and has it zeroed at composites. Is this a useful API?
@NAThompson That sounds like wheel factorization. That method is added to the sieve of Eratosthenes, but would not be applicable here.
Would it be more ergonomic to have prime_range
use a back_inserter
, but prime_sieve
behave more like std::fill
?
std::vector<int64_t> primes(1000); // gimme 1000 primes
prime_sieve(primes.begin(), primes.end());
(I need to stay in my lane here; @jeremy-murphy is much better at ergonomics.)
@mborland : I have begun a performance comparison with Kim Walish's prime sieve. Here is the code:
#include <vector>
#include <boost/math/special_functions/prime_sieve.hpp>
#include <benchmark/benchmark.h>
#include <primesieve.hpp>
template <class Z>
void prime_sieve(benchmark::State& state)
{
Z upper = static_cast<Z>(state.range(0));
for(auto _ : state)
{
std::vector<Z> primes;
benchmark::DoNotOptimize(boost::math::prime_sieve(static_cast<Z>(2), upper, std::back_inserter(primes)));
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(prime_sieve, int64_t)->RangeMultiplier(2)->Range(1 << 1, 1 << 22)->Complexity();
template <class Z>
void kimwalish_primes(benchmark::State& state)
{
Z upper = static_cast<Z>(state.range(0));
for (auto _ : state)
{
std::vector<Z> primes;
primesieve::generate_primes(upper, &primes);
benchmark::DoNotOptimize(primes.back());
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(kimwalish_primes, int64_t)->RangeMultiplier(2)->Range(1 << 1, 1 << 22)->Complexity();
BENCHMARK_MAIN();
and the results:
Run on (16 X 2300 MHz CPU s)
CPU Caches:
L1 Data 32K (x8)
L1 Instruction 32K (x8)
L2 Unified 262K (x8)
L3 Unified 16777K (x1)
Load Average: 2.49, 1.81, 1.78
----------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------
prime_sieve<int64_t>/2 348 ns 348 ns 1939182
prime_sieve<int64_t>/4 463 ns 462 ns 1467524
prime_sieve<int64_t>/8 603 ns 602 ns 1174654
prime_sieve<int64_t>/16 717 ns 717 ns 956781
prime_sieve<int64_t>/32 907 ns 907 ns 732693
prime_sieve<int64_t>/64 1146 ns 1146 ns 604673
prime_sieve<int64_t>/128 1410 ns 1410 ns 494574
prime_sieve<int64_t>/256 2190 ns 2190 ns 319042
prime_sieve<int64_t>/512 3838 ns 3838 ns 186498
prime_sieve<int64_t>/1024 6769 ns 6769 ns 103041
prime_sieve<int64_t>/2048 12757 ns 12756 ns 54359
prime_sieve<int64_t>/4096 25295 ns 25291 ns 26912
prime_sieve<int64_t>/8192 50898 ns 50880 ns 13661
prime_sieve<int64_t>/16384 101938 ns 101936 ns 6707
prime_sieve<int64_t>/32768 201957 ns 201936 ns 3455
prime_sieve<int64_t>/65536 396735 ns 396709 ns 1754
prime_sieve<int64_t>/131072 779986 ns 779933 ns 864
prime_sieve<int64_t>/262144 1534550 ns 1534233 ns 446
prime_sieve<int64_t>/524288 3020453 ns 3019390 ns 228
prime_sieve<int64_t>/1048576 6089458 ns 6089187 ns 107
prime_sieve<int64_t>/2097152 14131817 ns 14130295 ns 44
prime_sieve<int64_t>/4194304 30953282 ns 30952476 ns 21
prime_sieve<int64_t>_BigO 0.33 NlgN 0.33 NlgN
prime_sieve<int64_t>_RMS 9 % 9 %
kimwalish_primes<int64_t>/2 276 ns 276 ns 2263102
kimwalish_primes<int64_t>/4 280 ns 280 ns 2617321
kimwalish_primes<int64_t>/8 279 ns 279 ns 2461625
kimwalish_primes<int64_t>/16 289 ns 289 ns 2526273
kimwalish_primes<int64_t>/32 296 ns 296 ns 2348788
kimwalish_primes<int64_t>/64 287 ns 287 ns 2397589
kimwalish_primes<int64_t>/128 357 ns 357 ns 2088954
kimwalish_primes<int64_t>/256 368 ns 368 ns 1894662
kimwalish_primes<int64_t>/512 459 ns 459 ns 1601636
kimwalish_primes<int64_t>/1024 2177 ns 2176 ns 319871
kimwalish_primes<int64_t>/2048 2535 ns 2534 ns 274488
kimwalish_primes<int64_t>/4096 3377 ns 3377 ns 206241
kimwalish_primes<int64_t>/8192 4527 ns 4525 ns 155072
kimwalish_primes<int64_t>/16384 7015 ns 7014 ns 98151
kimwalish_primes<int64_t>/32768 10982 ns 10981 ns 63122
kimwalish_primes<int64_t>/65536 19631 ns 19627 ns 35826
kimwalish_primes<int64_t>/131072 35359 ns 35356 ns 19769
kimwalish_primes<int64_t>/262144 66530 ns 66523 ns 10449
kimwalish_primes<int64_t>/524288 126772 ns 126769 ns 4532
kimwalish_primes<int64_t>/1048576 246035 ns 245968 ns 2837
kimwalish_primes<int64_t>/2097152 477212 ns 477177 ns 1450
kimwalish_primes<int64_t>/4194304 1109480 ns 1109200 ns 621
kimwalish_primes<int64_t>_BigO 0.01 NlgN 0.01 NlgN
kimwalish_primes<int64_t>_RMS 11 % 11 %
So it looks like there is 1-2 orders of magnitude of performance improvement left in the boost implementation; presumably we need to find it. . .
@NAThompson It would be fairly easy to create light wrappers to the current prime_sieve
implementation like in the uni-variate statistics library.
I will have to do some digging to see where more performance can be squeezed out.
It would be fairly easy to create light wrappers to the current prime_sieve implementation like in the uni-variate statistics library.
Nah, if it's not obviously a better way to do it, I'm not interested in it.
@mborland : Also, make sure to tag your commit messages with [CI SKIP]
until we are about to merge.
I just went through The Joy of Factoring to refresh my memory on how these sieves work. I implemented algorithm 8.2 of that book, which the author claims runs in O(J*log(log(J)) time. He also references P.A. Pritchard who has given an algorithm which runs in O(J/log(log(J)) time; see A sublinear additive sieve for finding prime numbers, Comm. ACM 24, 1981.
In any case, this is how my naive implementation of algorithm 8.2 looks:
// Copyright 2020 Matt Borland
//
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt
// or copy at http://www.boost.org/LICENSE_1_0.txt)
#include <vector>
#include <boost/math/special_functions/prime_sieve.hpp>
#include <benchmark/benchmark.h>
#include <primesieve.hpp>
template <class Z>
void prime_sieve(benchmark::State& state)
{
Z upper = static_cast<Z>(state.range(0));
for(auto _ : state)
{
std::vector<Z> primes;
benchmark::DoNotOptimize(boost::math::prime_sieve(static_cast<Z>(2), upper, std::back_inserter(primes)));
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(prime_sieve, int64_t)->RangeMultiplier(2)->Range(1 << 1, 1 << 22)->Complexity();
template <class Z>
void kimwalish_primes(benchmark::State& state)
{
Z upper = static_cast<Z>(state.range(0));
for (auto _ : state)
{
std::vector<Z> primes;
primesieve::generate_primes(upper, &primes);
benchmark::DoNotOptimize(primes.back());
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(kimwalish_primes, int64_t)->RangeMultiplier(2)->Range(1 << 1, 1 << 22)->Complexity();
template<typename Z>
std::vector<bool> joy_of_factoring_bitset(const Z J)
{
using std::sqrt;
using std::log;
// A vector of bools behaves like a bitset, but with runtime size:
std::vector<bool> P(J, true);
P[0] = false;
P[1] = false;
Z p = 2;
Z root_j = sqrt(J);
while (p <= root_j) {
Z i = p + p;
while (i <= J) {
P[i] = false;
i = i + p;
}
i = p + 1;
while (i <= root_j && P[i] == false) {
i += 1;
}
p = i;
}
return P;
}
template<typename Z>
std::vector<Z> joy_of_factoring_primes(const Z J)
{
auto P = joy_of_factoring_bitset(J);
std::vector<Z> primes;
if (J >= 355991) {
double inv_log = 1/log(J);
size_t N = J*inv_log*(1+inv_log + 2.51*inv_log*inv_log);
primes.reserve(N);
}
else if (J>=55) {
double n = static_cast<double>(J)/(log(J) - 4.0);
primes.reserve(static_cast<size_t>(n));
}
primes.emplace_back(2);
for (size_t i = 3; i < J; i += 2) {
if (P[i]) {
primes.emplace_back(i);
}
}
return primes;
}
template <class Z>
void joy_of_factoring_bm(benchmark::State& state)
{
Z upper = static_cast<Z>(state.range(0));
for (auto _ : state)
{
benchmark::DoNotOptimize(joy_of_factoring_primes(upper));
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(joy_of_factoring_bm, size_t)->RangeMultiplier(2)->Range(1 << 1, 1 << 22)->Complexity();
BENCHMARK_MAIN();
And the results:
prime_sieve<int64_t>/2 352 ns 351 ns 1951056
prime_sieve<int64_t>/4 470 ns 470 ns 1481710
prime_sieve<int64_t>/8 597 ns 597 ns 1129615
prime_sieve<int64_t>/16 738 ns 738 ns 953198
prime_sieve<int64_t>/32 936 ns 936 ns 762660
prime_sieve<int64_t>/64 1145 ns 1145 ns 616909
prime_sieve<int64_t>/128 1407 ns 1407 ns 495898
prime_sieve<int64_t>/256 2247 ns 2246 ns 314915
prime_sieve<int64_t>/512 3715 ns 3713 ns 189649
prime_sieve<int64_t>/1024 6664 ns 6663 ns 103458
prime_sieve<int64_t>/2048 12745 ns 12741 ns 54189
prime_sieve<int64_t>/4096 25364 ns 25360 ns 27329
prime_sieve<int64_t>/8192 51157 ns 51151 ns 13472
prime_sieve<int64_t>/16384 102583 ns 102567 ns 6739
prime_sieve<int64_t>/32768 203596 ns 203550 ns 3428
prime_sieve<int64_t>/65536 399037 ns 399001 ns 1753
prime_sieve<int64_t>/131072 781683 ns 781580 ns 876
prime_sieve<int64_t>/262144 1538995 ns 1538545 ns 448
prime_sieve<int64_t>/524288 3095005 ns 3094665 ns 230
prime_sieve<int64_t>/1048576 6326961 ns 6326208 ns 101
prime_sieve<int64_t>/2097152 14024588 ns 14024205 ns 44
prime_sieve<int64_t>/4194304 31189328 ns 31180333 ns 21
prime_sieve<int64_t>_BigO 0.33 NlgN 0.33 NlgN
prime_sieve<int64_t>_RMS 9 % 9 %
kimwalish_primes<int64_t>/2 279 ns 279 ns 2509536
kimwalish_primes<int64_t>/4 289 ns 289 ns 2494788
kimwalish_primes<int64_t>/8 282 ns 282 ns 2466786
kimwalish_primes<int64_t>/16 313 ns 313 ns 2333162
kimwalish_primes<int64_t>/32 293 ns 293 ns 2271474
kimwalish_primes<int64_t>/64 304 ns 304 ns 2286663
kimwalish_primes<int64_t>/128 372 ns 372 ns 2002437
kimwalish_primes<int64_t>/256 383 ns 383 ns 1830046
kimwalish_primes<int64_t>/512 444 ns 443 ns 1595158
kimwalish_primes<int64_t>/1024 2178 ns 2177 ns 318499
kimwalish_primes<int64_t>/2048 2557 ns 2556 ns 271694
kimwalish_primes<int64_t>/4096 3363 ns 3363 ns 206716
kimwalish_primes<int64_t>/8192 4591 ns 4591 ns 152819
kimwalish_primes<int64_t>/16384 6976 ns 6974 ns 97478
kimwalish_primes<int64_t>/32768 11104 ns 11103 ns 62514
kimwalish_primes<int64_t>/65536 19537 ns 19534 ns 35619
kimwalish_primes<int64_t>/131072 35760 ns 35756 ns 19317
kimwalish_primes<int64_t>/262144 66588 ns 66586 ns 10404
kimwalish_primes<int64_t>/524288 128041 ns 128034 ns 5385
kimwalish_primes<int64_t>/1048576 245994 ns 245973 ns 2815
kimwalish_primes<int64_t>/2097152 481273 ns 481164 ns 1445
kimwalish_primes<int64_t>/4194304 1110850 ns 1110702 ns 618
kimwalish_primes<int64_t>_BigO 0.01 NlgN 0.01 NlgN
kimwalish_primes<int64_t>_RMS 11 % 11 %
joy_of_factoring_bm<size_t>/2 66.1 ns 66.1 ns 10413103
joy_of_factoring_bm<size_t>/4 252 ns 252 ns 2807457
joy_of_factoring_bm<size_t>/8 385 ns 385 ns 1901306
joy_of_factoring_bm<size_t>/16 508 ns 507 ns 1370265
joy_of_factoring_bm<size_t>/32 618 ns 618 ns 1111023
joy_of_factoring_bm<size_t>/64 238 ns 238 ns 2947071
joy_of_factoring_bm<size_t>/128 334 ns 334 ns 2148722
joy_of_factoring_bm<size_t>/256 532 ns 532 ns 1259151
joy_of_factoring_bm<size_t>/512 1027 ns 1027 ns 662553
joy_of_factoring_bm<size_t>/1024 2490 ns 2490 ns 281698
joy_of_factoring_bm<size_t>/2048 5211 ns 5211 ns 130022
joy_of_factoring_bm<size_t>/4096 10454 ns 10454 ns 65775
joy_of_factoring_bm<size_t>/8192 20807 ns 20805 ns 33289
joy_of_factoring_bm<size_t>/16384 41352 ns 41351 ns 16687
joy_of_factoring_bm<size_t>/32768 82184 ns 82180 ns 8354
joy_of_factoring_bm<size_t>/65536 161913 ns 161879 ns 4242
joy_of_factoring_bm<size_t>/131072 320842 ns 320755 ns 2170
joy_of_factoring_bm<size_t>/262144 640485 ns 640433 ns 1073
joy_of_factoring_bm<size_t>/524288 1289366 ns 1289235 ns 536
joy_of_factoring_bm<size_t>/1048576 2589004 ns 2588625 ns 269
joy_of_factoring_bm<size_t>/2097152 5304918 ns 5302623 ns 130
joy_of_factoring_bm<size_t>/4194304 11621459 ns 11620593 ns 59
joy_of_factoring_bm<size_t>_BigO 0.12 NlgN 0.12 NlgN
joy_of_factoring_bm<size_t>_RMS 5 % 5 %
Algorithm 8.2 is actually competitive with Kim Walish for smaller N but starts to lag for larger N.
@NAThompson I am seeing a similar result using a segmented sieve. Currently faster than kimwalish, and joy_of_factoring_bm
from your post until 524288. After that kimwalish pulls ahead.
Just ran this benchmark under perf
:
So Kim Walish's prime generator does indeed compute logarithms and fill up a table of primes; quite a bit of time in malloc
; quite a bit of time in fill
.
However if you look at the Joy of Factoring algorithm, it spends way more time in the fill
:
So there are some opportunities there; you'll see that I'm only checking odd numbers after 2, there's probably some way to extend that to making the fill faster.
@NAThompson I have found some performance improvements using C-style arrays instead of std::vector::reserve()
which makes sense given that data.
@NAThompson This latest iteration's benchmark delivered much better results:
prime_sieve<int64_t>/real_time_BigO 2.87 N 0.22 N
prime_sieve<int64_t>/real_time_RMS 11 % 27 %
@mborland : Really cool. BTW, if you're going to do threading, write a parallel execution policy; we need the ability to not use threads. Here's an example I started (but never finished since it was just not enough compute/memory reference).
@NAThompson This should fit the bill. Performance ticked down a bit to 3.07N/0.23N because I realized I was relying on the prime table to be statically linked. If you dynamically linked the previous implementation the performance was ~0.2NlgN.
Performance ticked down a bit to 3.07N/0.23N because I realized I was relying on the prime table to be statically linked.
I do recommend that the prime table not be used at all; my own preference would be to deprecate this header once your sieve is in.
@NAThompson That makes sense. I can add the notice.
@NAThompson What are your thoughts on defining a seq_prime_sieve
function instead of wrapping the parallel prime_sieve
in a block that requires an execution policy. My concern is that execution policies are a C++17 facility, and everything is currently compliant with C++11 except that.
@mborland : All the new features I'm adding require C++17; also the univariate_statistics.hpp
file requires C++-17 and uses if constexpr
to great effect, IMO.
My view is that gcc, clang and MSVC have been C++-17 compliant for some time now, so if C++-17 provides idioms that makes a function more ergonomic then we should use it. In addition, I thought that your sieve with parallel execution policies was shaping up to be one of the coolest features in all of the library. So I'm biased on that point. . .
@NAThompson Works for me. I saw the notice that C++03 is just now being deprecated so I figured I would ask. I believe that GCC 11 is going to have C++17 set as the new default.
I do recommend that the prime table not be used at all; my own preference would be to deprecate this header once your sieve is in.
Note that the internal use case for that table really does require fast table lookup of primes.
Note that the internal use case for that table really does require fast table lookup of primes.
Ok, no deprecation message, but there is hope that this will be faster than reading out of program memory.
@NAThompson A section to use as many threads as processors available has been added. With the range of benchmark maxed out at 2^30 it bench marks at 6.86 N/0.69N
I added an additional unit test for the multi-threading section and added this set of tests to the Jamfile. Are the 2 thread tests ok to send to CI or should I comment those out as well?
@mborland : What is the use case for boost::math::prime_range(100, 1000, std::back_inserter(primes));
? I understand wanting all the primes < x
, but I've never seen anyone need primes in a range.
Are the 2 thread tests ok to send to CI or should I comment those out as well?
Yeah send them in. Also, could you run it under -fsanitize=thread
and make sure it's clean?
@NAThompson I can't personally tell you a use case for prime_range
, but the functionality is available in other libraries like sympy.
Locally -fsanitize=thread
and fsanitize=address
are both clean.
@mborland : Just tried to build the docs; got the following error:
sf/number_series.qbk:325: error: Mismatched [endsect] near column 9.
@mborland : Very cool; here's the comparison to Kim Walish's sieve:
#include <vector>
#include <boost/math/special_functions/prime_sieve.hpp>
#include <benchmark/benchmark.h>
#include <primesieve.hpp>
template <class Z>
void prime_sieve(benchmark::State& state)
{
Z upper = static_cast<Z>(state.range(0));
for(auto _ : state)
{
std::vector<Z> primes;
benchmark::DoNotOptimize(boost::math::prime_sieve(std::execution::par, upper, std::back_inserter(primes)));
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(prime_sieve, int64_t)->RangeMultiplier(2)->Range(1 << 1, 1 << 22)->Complexity(benchmark::oN)->UseRealTime();
template <class Z>
void kimwalish_primes(benchmark::State& state)
{
Z upper = static_cast<Z>(state.range(0));
for (auto _ : state)
{
std::vector<Z> primes;
primesieve::generate_primes(upper, &primes);
benchmark::DoNotOptimize(primes.back());
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(kimwalish_primes, int64_t)->RangeMultiplier(2)->Range(1 << 1, 1 << 22)->Complexity(benchmark::oN)->UseRealTime();
BENCHMARK_MAIN();
Here I force it to regress against O(N) scaling since that makes comparison easier:
prime_sieve<int64_t>/2/real_time 417 ns 416 ns 1688428
prime_sieve<int64_t>/4/real_time 590 ns 589 ns 1132537
prime_sieve<int64_t>/8/real_time 932 ns 930 ns 730881
prime_sieve<int64_t>/16/real_time 1235 ns 1229 ns 573691
prime_sieve<int64_t>/32/real_time 1477 ns 1473 ns 465590
prime_sieve<int64_t>/64/real_time 1672 ns 1667 ns 412482
prime_sieve<int64_t>/128/real_time 2017 ns 2009 ns 356051
prime_sieve<int64_t>/256/real_time 2795 ns 2786 ns 249987
prime_sieve<int64_t>/512/real_time 4226 ns 4209 ns 166142
prime_sieve<int64_t>/1024/real_time 6851 ns 6835 ns 99070
prime_sieve<int64_t>/2048/real_time 13647 ns 13622 ns 49657
prime_sieve<int64_t>/4096/real_time 24109 ns 24061 ns 28328
prime_sieve<int64_t>/8192/real_time 88496 ns 38358 ns 7544
prime_sieve<int64_t>/16384/real_time 104841 ns 40361 ns 5822
prime_sieve<int64_t>/32768/real_time 181437 ns 65559 ns 4329
prime_sieve<int64_t>/65536/real_time 281237 ns 68548 ns 2480
prime_sieve<int64_t>/131072/real_time 572515 ns 101001 ns 1238
prime_sieve<int64_t>/262144/real_time 1155111 ns 154189 ns 614
prime_sieve<int64_t>/524288/real_time 2800881 ns 294266 ns 278
prime_sieve<int64_t>/1048576/real_time 5264969 ns 445906 ns 127
prime_sieve<int64_t>/2097152/real_time 14900995 ns 1180893 ns 56
prime_sieve<int64_t>/4194304/real_time 45743068 ns 2255611 ns 18
prime_sieve<int64_t>/real_time_BigO 9.83 N 0.54 N
prime_sieve<int64_t>/real_time_RMS 61 % 16 %
kimwalish_primes<int64_t>/2/real_time 580 ns 465 ns 1461148
kimwalish_primes<int64_t>/4/real_time 507 ns 473 ns 1430308
kimwalish_primes<int64_t>/8/real_time 691 ns 550 ns 1165038
kimwalish_primes<int64_t>/16/real_time 681 ns 523 ns 935754
kimwalish_primes<int64_t>/32/real_time 481 ns 454 ns 1098076
kimwalish_primes<int64_t>/64/real_time 506 ns 490 ns 1000000
kimwalish_primes<int64_t>/128/real_time 585 ns 571 ns 1169549
kimwalish_primes<int64_t>/256/real_time 625 ns 613 ns 1066912
kimwalish_primes<int64_t>/512/real_time 753 ns 730 ns 895759
kimwalish_primes<int64_t>/1024/real_time 3281 ns 3228 ns 169253
kimwalish_primes<int64_t>/2048/real_time 3736 ns 3728 ns 184918
kimwalish_primes<int64_t>/4096/real_time 5086 ns 5074 ns 130160
kimwalish_primes<int64_t>/8192/real_time 7088 ns 7050 ns 97634
kimwalish_primes<int64_t>/16384/real_time 10945 ns 10921 ns 62913
kimwalish_primes<int64_t>/32768/real_time 17168 ns 17057 ns 40847
kimwalish_primes<int64_t>/65536/real_time 30259 ns 30192 ns 22623
kimwalish_primes<int64_t>/131072/real_time 55462 ns 55328 ns 12102
kimwalish_primes<int64_t>/262144/real_time 103074 ns 102894 ns 6650
kimwalish_primes<int64_t>/524288/real_time 195996 ns 195575 ns 3428
kimwalish_primes<int64_t>/1048576/real_time 380300 ns 379350 ns 1849
kimwalish_primes<int64_t>/2097152/real_time 863277 ns 861173 ns 776
kimwalish_primes<int64_t>/4194304/real_time 1719699 ns 1714099 ns 394
kimwalish_primes<int64_t>/real_time_BigO 0.41 N 0.41 N
kimwalish_primes<int64_t>/real_time_RMS 7 % 7 %