Implement `safegcd-bounds`
This is a corresponding tracking issue for this TODO: https://github.com/RustCrypto/crypto-bigint/blob/ae30093/src/modular/safegcd.rs#L341
The bounds we currently implement for Bernstein-Yang are the ones described in the paper, which proves that the algorithm will always converge within the prescribed bounds. However, the bounds are overly conservative and not optimal:
https://github.com/sipa/safegcd-bounds
There is both an improved bounds calculation we can use, as well as an improved divsteps algorithm (hddivsteps) which itself has improved bounds over the original divsteps algorithm.
https://github.com/sipa/safegcd-bounds
Even more details here.
The current impl results in the following bounds (obtained by just printing out the iteration count while running the crate test suite, sampling the smallest and biggest iteration counts seen):
f bits: 250, g bits: 254 => iterations: 735
f bits: 256, g bits: 256 => iterations: 741
f bits: 1022, g bits: 1021 => iterations: 2949
f bits: 1024, g bits: 1024 => iterations: 2954
f bits: 2043, g bits: 2042 => iterations: 5892
f bits: 2048, g bits: 2048 => iterations: 5906
The improved CT algorithm results in 590 iterations for 256 bit; extrapolated to bigger sizes would mean 2360 iters for U1024 and 4720 for U2048. So there's potentially a ~25% gain to be made here.
Though perhaps we should just go full binary GCD: #755