Convert to base 10 is significantly slower than similar libraries
Other libraries like GMP use a divide and conquer algorithm which find a power of 10 with size near half of the digits of the number, use that as the base and convert the original number to a 2 digit number in that base, convert each digit recursively to base 10 and finally concat the results. Assuming that we have a O(nlogn) multiplication routine, this algorithm would be O(nlog^2 n) and would be faster than the current O(n sqrt n) algorithm.
The current algorithm came in #216, and I mentioned then:
I honestly expect there's still room for improvement,
Help is welcome!
I'm interested in helping, num-bigint is the default choice for arbitrary precision integers in Rust so I think it is important to make it fast.
The first step is to merge #282 or something like that since the above algorithm requires a fast multiplication algorithm.
EDIT: I should have started by reading the work by @HKalbasi on PR #316, it addresses pretty much everything I mentioned in my comment 😅
I ran into this issue as well. Here are my findings, for reference.
I'm converting 2^2^23 to a string using the following code in rust:
use num_bigint::BigUint;
fn main() {
(BigUint::from(1u32) << (1u32 << 23)).to_string();
}
Here's the corresponding timing, and the comparison with python 3.13:
$ time ./target/release/bigint
8,91s user 0,01s system 99% cpu 8,923 total
$ time PYTHONINTMAXSTRDIGITS=0 python -c "str(1 << (1 << 23))"
0,27s user 0,01s system 99% cpu 0,287 total
Interestingly enough, the python conversion is implemented in pure python: https://github.com/python/cpython/blob/2905690a91bf72cdf0fb919b5193849bb67732e2/Lib/_pylong.py#L178-L179
It does leverage the power of the decimal module though. Here is the simplified code, which performs identically to the previous benchmark (i.e about 300 ms):
import decimal
import functools
def convert(n: int) -> str:
@functools.cache
def power10(k: int) -> int:
return decimal.Decimal(2) ** k
def inner(n, w):
if w <= 200:
return decimal.Decimal(n)
q, r = divmod(w, 2)
w1, w2 = q + r, q
hi = n >> w2
lo = n & ((1 << w2) - 1)
return inner(lo, w2) + inner(hi, w1) * power10(w2)
unbounded_dec_context = decimal.getcontext().copy()
unbounded_dec_context.prec = decimal.MAX_PREC
unbounded_dec_context.Emax = decimal.MAX_EMAX
with decimal.localcontext(unbounded_dec_context):
return str(inner(n, n.bit_length()))
if __name__ == "__main__":
x = 1 << (1 << 23)
s = convert(x)
assert s == str(x)
However it also provides a fallback implementation that doesn't rely on the decimal module:
https://github.com/python/cpython/blob/2905690a91bf72cdf0fb919b5193849bb67732e2/Lib/_pylong.py#L187-L190
Here is the simplified corresponding code:
import math
import functools
def convert(n: int) -> str:
@functools.cache
def power10(k: int) -> int:
return 10**k
def inner(n, w):
if w <= 1000:
return str(n)
q, r = divmod(w, 2)
w1, w2 = q + r, q
hi, lo = divmod(n, power10(w2))
return inner(hi, w1) + inner(lo, w2).zfill(w2)
w = int(n.bit_length() * math.log10(2) + 1)
s = inner(n, w)
if s[0] == "0" and n:
s = s.lstrip("0")
return s
if __name__ == "__main__":
x = 1 << (1 << 23)
s = convert(x)
assert s == str(x)
It is much slower, but still faster than the num-bigint implementation (about 3 seconds, vs 9 seconds):
$ time PYTHONINTMAXSTRDIGITS=0 python -O test.py
3,06s user 0,02s system 99% cpu 3,102 total
I ported it to rust with the following implementation:
use num_bigint::BigUint;
use num_integer::Integer;
use std::collections::HashMap;
fn bigint_to_string(n: &BigUint) -> String {
let mut cache: HashMap<u32, BigUint> = HashMap::new();
fn inner(n: &BigUint, w: u32, cache: &mut HashMap<u32, BigUint>) -> String {
if w <= 1000 {
return n.to_string();
}
let q = w / 2;
let w1 = q + w % 2;
let w2 = q;
let power10 = cache
.entry(w2)
.or_insert_with_key(|k| BigUint::from(10u32).pow(*k));
let (hi, lo) = n.div_rem(power10);
let hi_str = inner(&hi, w1, cache);
let lo_str = inner(&lo, w2, cache);
format!(
"{}{}{}",
hi_str,
"0".repeat(w2 as usize - lo_str.len()),
lo_str
)
}
let w = (n.bits() as f64 * 2f64.log10() + 1f64) as u32;
let s = inner(n, w, &mut cache);
if s.starts_with('0') && n != &BigUint::from(0u32) {
s.trim_start_matches('0').to_string()
} else {
s
}
}
fn main() {
let x = BigUint::from(1u32) << (1u32 << 23);
bigint_to_string(&x);
// assert_eq!(bigint_to_string(&x), x.to_string());
}
However it does not perform better than the num-bigint implementation:
$ time ./target/release/bigint
8,91s user 0,01s system 99% cpu 8,929 total
I suspect that this is because the n.div_rem(power10) operation is slower than the corresponding python implementation. ~I might open another issue about this.~
EDIT: I created the corresponding issue, see https://github.com/rust-num/num-bigint/issues/323.
I also used the rust int_divmod from https://github.com/rust-num/num-bigint/issues/323#issuecomment-2701876848 to improve on bigint_to_string, see this gist.
This new implementation is now 10 times faster than the current to_string() implementation:
$ time target/release/bigint
0,82s user 0,01s system 99% cpu 0,838 total