coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

Speed up factorisation for 128-bit integers

Open JASory opened this issue 8 months ago • 21 comments

Considerably speeds up hard cases of factorisation for 128-bit integers. Should be comparable or faster than GNU factor for values in that range. Fixes #1456 and #1559 can also be closed.

JASory avatar Mar 07 '25 16:03 JASory

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Mar 07 '25 17:03 github-actions[bot]

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

github-actions[bot] avatar Mar 07 '25 17:03 github-actions[bot]

did you run some benchmarks ? we like hyperfine for this ;)

sylvestre avatar Mar 07 '25 20:03 sylvestre

i just tried and I am getting:

$ seq 2 10000000 > /tmp/a
$ hyperfine "./target/release/coreutils.old factor < /tmp/a" "./target/release/coreutils factor < /tmp/a" "/usr/bin/factor < /tmp/a"
Benchmark 1: ./target/release/coreutils.old factor < /tmp/a
  Time (mean ± σ):     15.811 s ±  1.055 s    [User: 14.924 s, System: 0.885 s]
  Range (min … max):   13.540 s … 17.561 s    10 runs

Benchmark 2: ./target/release/coreutils factor < /tmp/a
  Time (mean ± σ):     25.513 s ±  0.409 s    [User: 24.629 s, System: 0.882 s]
  Range (min … max):   24.868 s … 26.120 s    10 runs

Benchmark 3: /usr/bin/factor < /tmp/a
  Time (mean ± σ):      1.604 s ±  0.037 s    [User: 1.556 s, System: 0.047 s]
  Range (min … max):    1.546 s …  1.666 s    10 runs

Summary
  /usr/bin/factor < /tmp/a ran
    9.86 ± 0.70 times faster than ./target/release/coreutils.old factor < /tmp/a
   15.91 ± 0.45 times faster than ./target/release/coreutils factor < /tmp/a

sylvestre avatar Mar 07 '25 21:03 sylvestre

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

github-actions[bot] avatar Mar 07 '25 21:03 github-actions[bot]

I edited it to use a 64-bit which will be slightly faster for the interval, but it still probably won't be faster than GNU factor for the interval checked (0;10^7).

The reason is trial division is faster than pollard rho for small values, but throwing more trial division at most 128-bit composites is actually a complete waste so I don't bother to optimise for that.

Note that half of all 128-bit composites are greater than 2^127. So a benchmark for the average input value would be much more accurate to test.

The benchmark I ran was random input of 128-bit integers generated by openssl, which unfortunately is quite a bit harder to test when factor doesn't support hexadecimal input. (However, my algorithmically identical fork does. which is what I benchmarked).

JASory avatar Mar 08 '25 00:03 JASory

Apparently the code was still using the old algorithm plus the new one. This should be considerably faster, but still not for the smallest integers. The crossover to beat GNU factor appears to be around 2^50.

Using the time program, it seems to run in 11t GNU factor for 2;10^7. But this improves to 3t around 2^32 and equals out at 2^49. Beyond that it tends to be faster than GNU factor.

JASory avatar Mar 08 '25 00:03 JASory

GNU testsuite comparison:

GNU test failed: tests/factor/factor. tests/factor/factor is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Mar 08 '25 06:03 github-actions[bot]

the ci is quite unhappy :)

sylvestre avatar Mar 08 '25 09:03 sylvestre

I think the minimum rust version is 1.84, since machine-prime uses const static arrays. Other than that, I think the issue is with code formatting, but I have no idea how to fix that since I ran cargo fmt before committing.

Apparently it fails the check that factors for larger than 2^128.

JASory avatar Mar 08 '25 09:03 JASory

Factorisation of many numbers beyond 2^128, like 2^128+1 or 2^128+25, 2^128+35 fails. But this is an issue from the num-prime library, apparently it just gives up even though pollard-rho will eventually succeed.

JASory avatar Mar 08 '25 10:03 JASory

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

github-actions[bot] avatar Mar 08 '25 10:03 github-actions[bot]

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

github-actions[bot] avatar Mar 08 '25 11:03 github-actions[bot]

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Mar 09 '25 08:03 github-actions[bot]

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/usage_vs_getopt (passes in this run but fails in the 'main' branch)

github-actions[bot] avatar Mar 09 '25 14:03 github-actions[bot]

Also, you need to commit the Cargo.lock file

RenjiSann avatar Mar 11 '25 09:03 RenjiSann

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Mar 14 '25 15:03 github-actions[bot]

any update? thanks

sylvestre avatar Mar 23 '25 20:03 sylvestre

any update? thanks

Yes, it's slightly faster now, the code was simplified. Realistically beating GNU factor for tiny integers would require precomputing large prime_inverse tables which I am not interested in. As is, on my machine this implementation is faster than the current one in all intervals.

I haven't bothered to make corrections for the CI since I have little free time now, and last I checked building for termux is going to fail until they update their rust toolchain.

New benchmarks, using hyperfine

New version

Benchmark 1: seq 2 1000000 | target/release/factor > /tmp/a
  Time (mean ± σ):      3.957 s ±  0.031 s    [User: 2.819 s, System: 1.152 s]
  Range (min … max):    3.929 s …  4.022 s    10 runs
 
Benchmark 1: seq 4294967296 4295967296 | target/release/factor > /tmp/a
  Time (mean ± σ):      5.285 s ±  0.274 s    [User: 4.067 s, System: 1.231 s]
  Range (min … max):    5.103 s …  5.991 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 1: seq 9223372036854775808 9223372036854875808 | target/release/factor > /tmp/a
  Time (mean ± σ):      2.703 s ±  0.011 s    [User: 2.547 s, System: 0.157 s]
  Range (min … max):    2.692 s …  2.732 s    10 runs
 
Benchmark 1: seq 18446744073709551616 18446744073709651616 | target/release/factor > /tmp/a
  Time (mean ± σ):      4.123 s ±  0.080 s    [User: 3.956 s, System: 0.162 s]
  Range (min … max):    4.053 s …  4.277 s    10 runs
 
Benchmark 1: seq 1267650600228229401496703205376 1267650600228229401496703206376 | target/release/factor > /tmp/a
  Time (mean ± σ):     11.681 s ±  0.129 s    [User: 11.623 s, System: 0.014 s]
  Range (min … max):   11.519 s … 11.857 s    10 runs
 
Benchmark 1: seq 170141183460469231731687303715884105728 170141183460469231731687303715884105828 | target/release/factor > /tmp/a
  Time (mean ± σ):     37.721 s ±  0.456 s    [User: 37.565 s, System: 0.020 s]
  Range (min … max):   37.072 s … 38.755 s    10 runs

Old version 

Benchmark 1: seq 2 1000000 | target/release/factor > /tmp/a
  Time (mean ± σ):      4.336 s ±  0.060 s    [User: 3.089 s, System: 1.260 s]
  Range (min … max):    4.276 s …  4.463 s    10 runs
 
Benchmark 1: seq 4294967296 4295967296 | target/release/factor > /tmp/a
  Time (mean ± σ):      9.089 s ±  0.157 s    [User: 7.692 s, System: 1.412 s]
  Range (min … max):    8.916 s …  9.376 s    10 runs
 
Benchmark 1: seq 9223372036854775808 9223372036854875808 | target/release/factor > /tmp/a
  Time (mean ± σ):     28.509 s ±  0.205 s    [User: 28.215 s, System: 0.213 s]
  Range (min … max):   28.273 s … 28.953 s    10 runs
 
Benchmark 1: seq 18446744073709551616 18446744073709651616 | target/release/factor > /tmp/a
  Time (mean ± σ):     34.087 s ±  0.326 s    [User: 33.756 s, System: 0.235 s]
  Range (min … max):   33.655 s … 34.523 s    10 runs

Benchmark 1: seq 1267650600228229401496703205376 1267650600228229401496703206376 | target/release/factor > /tmp/a
  Time (mean ± σ):     133.115 s ±  5.023 s    [User: 132.686 s, System: 0.054 s]
  Range (min … max):   121.333 s … 138.788 s    10 runs

Took too long but I got 1472 for a single run
1472.008

JASory avatar Mar 24 '25 14:03 JASory

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Apr 17 '25 12:04 github-actions[bot]

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Apr 17 '25 17:04 github-actions[bot]

please reopen when it is ready

sylvestre avatar Oct 06 '25 20:10 sylvestre