coreutils
coreutils copied to clipboard
Speed up factorisation for 128-bit integers
Considerably speeds up hard cases of factorisation for 128-bit integers. Should be comparable or faster than GNU factor for values in that range. Fixes #1456 and #1559 can also be closed.
GNU testsuite comparison:
Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
GNU testsuite comparison:
Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
did you run some benchmarks ? we like hyperfine for this ;)
i just tried and I am getting:
$ seq 2 10000000 > /tmp/a
$ hyperfine "./target/release/coreutils.old factor < /tmp/a" "./target/release/coreutils factor < /tmp/a" "/usr/bin/factor < /tmp/a"
Benchmark 1: ./target/release/coreutils.old factor < /tmp/a
Time (mean ± σ): 15.811 s ± 1.055 s [User: 14.924 s, System: 0.885 s]
Range (min … max): 13.540 s … 17.561 s 10 runs
Benchmark 2: ./target/release/coreutils factor < /tmp/a
Time (mean ± σ): 25.513 s ± 0.409 s [User: 24.629 s, System: 0.882 s]
Range (min … max): 24.868 s … 26.120 s 10 runs
Benchmark 3: /usr/bin/factor < /tmp/a
Time (mean ± σ): 1.604 s ± 0.037 s [User: 1.556 s, System: 0.047 s]
Range (min … max): 1.546 s … 1.666 s 10 runs
Summary
/usr/bin/factor < /tmp/a ran
9.86 ± 0.70 times faster than ./target/release/coreutils.old factor < /tmp/a
15.91 ± 0.45 times faster than ./target/release/coreutils factor < /tmp/a
GNU testsuite comparison:
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
I edited it to use a 64-bit which will be slightly faster for the interval, but it still probably won't be faster than GNU factor for the interval checked (0;10^7).
The reason is trial division is faster than pollard rho for small values, but throwing more trial division at most 128-bit composites is actually a complete waste so I don't bother to optimise for that.
Note that half of all 128-bit composites are greater than 2^127. So a benchmark for the average input value would be much more accurate to test.
The benchmark I ran was random input of 128-bit integers generated by openssl, which unfortunately is quite a bit harder to test when factor doesn't support hexadecimal input. (However, my algorithmically identical fork does. which is what I benchmarked).
Apparently the code was still using the old algorithm plus the new one. This should be considerably faster, but still not for the smallest integers. The crossover to beat GNU factor appears to be around 2^50.
Using the time program, it seems to run in 11t GNU factor for 2;10^7. But this improves to 3t around 2^32 and equals out at 2^49. Beyond that it tends to be faster than GNU factor.
GNU testsuite comparison:
GNU test failed: tests/factor/factor. tests/factor/factor is passing on 'main'. Maybe you have to rebase?
the ci is quite unhappy :)
I think the minimum rust version is 1.84, since machine-prime uses const static arrays. Other than that, I think the issue is with code formatting, but I have no idea how to fix that since I ran cargo fmt before committing.
Apparently it fails the check that factors for larger than 2^128.
Factorisation of many numbers beyond 2^128, like 2^128+1 or 2^128+25, 2^128+35 fails. But this is an issue from the num-prime library, apparently it just gives up even though pollard-rho will eventually succeed.
GNU testsuite comparison:
Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
GNU testsuite comparison:
Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
GNU testsuite comparison:
Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
GNU testsuite comparison:
Skipping an intermittent issue tests/misc/usage_vs_getopt (passes in this run but fails in the 'main' branch)
Also, you need to commit the Cargo.lock file
GNU testsuite comparison:
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
any update? thanks
any update? thanks
Yes, it's slightly faster now, the code was simplified. Realistically beating GNU factor for tiny integers would require precomputing large prime_inverse tables which I am not interested in. As is, on my machine this implementation is faster than the current one in all intervals.
I haven't bothered to make corrections for the CI since I have little free time now, and last I checked building for termux is going to fail until they update their rust toolchain.
New benchmarks, using hyperfine
New version
Benchmark 1: seq 2 1000000 | target/release/factor > /tmp/a
Time (mean ± σ): 3.957 s ± 0.031 s [User: 2.819 s, System: 1.152 s]
Range (min … max): 3.929 s … 4.022 s 10 runs
Benchmark 1: seq 4294967296 4295967296 | target/release/factor > /tmp/a
Time (mean ± σ): 5.285 s ± 0.274 s [User: 4.067 s, System: 1.231 s]
Range (min … max): 5.103 s … 5.991 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 1: seq 9223372036854775808 9223372036854875808 | target/release/factor > /tmp/a
Time (mean ± σ): 2.703 s ± 0.011 s [User: 2.547 s, System: 0.157 s]
Range (min … max): 2.692 s … 2.732 s 10 runs
Benchmark 1: seq 18446744073709551616 18446744073709651616 | target/release/factor > /tmp/a
Time (mean ± σ): 4.123 s ± 0.080 s [User: 3.956 s, System: 0.162 s]
Range (min … max): 4.053 s … 4.277 s 10 runs
Benchmark 1: seq 1267650600228229401496703205376 1267650600228229401496703206376 | target/release/factor > /tmp/a
Time (mean ± σ): 11.681 s ± 0.129 s [User: 11.623 s, System: 0.014 s]
Range (min … max): 11.519 s … 11.857 s 10 runs
Benchmark 1: seq 170141183460469231731687303715884105728 170141183460469231731687303715884105828 | target/release/factor > /tmp/a
Time (mean ± σ): 37.721 s ± 0.456 s [User: 37.565 s, System: 0.020 s]
Range (min … max): 37.072 s … 38.755 s 10 runs
Old version
Benchmark 1: seq 2 1000000 | target/release/factor > /tmp/a
Time (mean ± σ): 4.336 s ± 0.060 s [User: 3.089 s, System: 1.260 s]
Range (min … max): 4.276 s … 4.463 s 10 runs
Benchmark 1: seq 4294967296 4295967296 | target/release/factor > /tmp/a
Time (mean ± σ): 9.089 s ± 0.157 s [User: 7.692 s, System: 1.412 s]
Range (min … max): 8.916 s … 9.376 s 10 runs
Benchmark 1: seq 9223372036854775808 9223372036854875808 | target/release/factor > /tmp/a
Time (mean ± σ): 28.509 s ± 0.205 s [User: 28.215 s, System: 0.213 s]
Range (min … max): 28.273 s … 28.953 s 10 runs
Benchmark 1: seq 18446744073709551616 18446744073709651616 | target/release/factor > /tmp/a
Time (mean ± σ): 34.087 s ± 0.326 s [User: 33.756 s, System: 0.235 s]
Range (min … max): 33.655 s … 34.523 s 10 runs
Benchmark 1: seq 1267650600228229401496703205376 1267650600228229401496703206376 | target/release/factor > /tmp/a
Time (mean ± σ): 133.115 s ± 5.023 s [User: 132.686 s, System: 0.054 s]
Range (min … max): 121.333 s … 138.788 s 10 runs
Took too long but I got 1472 for a single run
1472.008
GNU testsuite comparison:
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
GNU testsuite comparison:
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
please reopen when it is ready