num-bigint Convert to base 10 is significantly slower than similar libraries

Other libraries like GMP use a divide and conquer algorithm which find a power of 10 with size near half of the digits of the number, use that as the base and convert the original number to a 2 digit number in that base, convert each digit recursively to base 10 and finally concat the results. Assuming that we have a O(nlogn) multiplication routine, this algorithm would be O(nlog^2 n) and would be faster than the current O(n sqrt n) algorithm.

Dec 13 '24 12:12 HKalbasi

The current algorithm came in #216, and I mentioned then:

I honestly expect there's still room for improvement,

Help is welcome!

Dec 13 '24 18:12 cuviper

I'm interested in helping, num-bigint is the default choice for arbitrary precision integers in Rust so I think it is important to make it fast.

The first step is to merge #282 or something like that since the above algorithm requires a fast multiplication algorithm.

Dec 13 '24 19:12 HKalbasi

EDIT: I should have started by reading the work by @HKalbasi on PR #316, it addresses pretty much everything I mentioned in my comment 😅

I ran into this issue as well. Here are my findings, for reference.

I'm converting 2^2^23 to a string using the following code in rust:

use num_bigint::BigUint;

fn main() {
    (BigUint::from(1u32) << (1u32 << 23)).to_string();
}

Here's the corresponding timing, and the comparison with python 3.13:

$ time ./target/release/bigint
8,91s user 0,01s system 99% cpu 8,923 total
$ time PYTHONINTMAXSTRDIGITS=0 python -c "str(1 << (1 << 23))"
0,27s user 0,01s system 99% cpu 0,287 total

Interestingly enough, the python conversion is implemented in pure python: https://github.com/python/cpython/blob/2905690a91bf72cdf0fb919b5193849bb67732e2/Lib/_pylong.py#L178-L179

It does leverage the power of the decimal module though. Here is the simplified code, which performs identically to the previous benchmark (i.e about 300 ms):

import decimal
import functools


def convert(n: int) -> str:

    @functools.cache
    def power10(k: int) -> int:
        return decimal.Decimal(2) ** k

    def inner(n, w):
        if w <= 200:
            return decimal.Decimal(n)
        q, r = divmod(w, 2)
        w1, w2 = q + r, q
        hi = n >> w2
        lo = n & ((1 << w2) - 1)
        return inner(lo, w2) + inner(hi, w1) * power10(w2)

    unbounded_dec_context = decimal.getcontext().copy()
    unbounded_dec_context.prec = decimal.MAX_PREC
    unbounded_dec_context.Emax = decimal.MAX_EMAX
    with decimal.localcontext(unbounded_dec_context):
        return str(inner(n, n.bit_length()))


if __name__ == "__main__":
    x = 1 << (1 << 23)
    s = convert(x)
    assert s == str(x)

However it also provides a fallback implementation that doesn't rely on the decimal module: https://github.com/python/cpython/blob/2905690a91bf72cdf0fb919b5193849bb67732e2/Lib/_pylong.py#L187-L190

Here is the simplified corresponding code:

import math
import functools


def convert(n: int) -> str:

    @functools.cache
    def power10(k: int) -> int:
        return 10**k

    def inner(n, w):
        if w <= 1000:
            return str(n)
        q, r = divmod(w, 2)
        w1, w2 = q + r, q
        hi, lo = divmod(n, power10(w2))
        return inner(hi, w1) + inner(lo, w2).zfill(w2)

    w = int(n.bit_length() * math.log10(2) + 1)
    s = inner(n, w)
    if s[0] == "0" and n:
        s = s.lstrip("0")
    return s


if __name__ == "__main__":
    x = 1 << (1 << 23)
    s = convert(x)
    assert s == str(x)

It is much slower, but still faster than the num-bigint implementation (about 3 seconds, vs 9 seconds):

$ time PYTHONINTMAXSTRDIGITS=0 python -O test.py
3,06s user 0,02s system 99% cpu 3,102 total

I ported it to rust with the following implementation:

use num_bigint::BigUint;
use num_integer::Integer;
use std::collections::HashMap;

fn bigint_to_string(n: &BigUint) -> String {
    let mut cache: HashMap<u32, BigUint> = HashMap::new();

    fn inner(n: &BigUint, w: u32, cache: &mut HashMap<u32, BigUint>) -> String {
        if w <= 1000 {
            return n.to_string();
        }
        let q = w / 2;
        let w1 = q + w % 2;
        let w2 = q;
        let power10 = cache
            .entry(w2)
            .or_insert_with_key(|k| BigUint::from(10u32).pow(*k));
        let (hi, lo) = n.div_rem(power10);
        let hi_str = inner(&hi, w1, cache);
        let lo_str = inner(&lo, w2, cache);
        format!(
            "{}{}{}",
            hi_str,
            "0".repeat(w2 as usize - lo_str.len()),
            lo_str
        )
    }

    let w = (n.bits() as f64 * 2f64.log10() + 1f64) as u32;
    let s = inner(n, w, &mut cache);
    if s.starts_with('0') && n != &BigUint::from(0u32) {
        s.trim_start_matches('0').to_string()
    } else {
        s
    }
}

fn main() {
    let x = BigUint::from(1u32) << (1u32 << 23);
    bigint_to_string(&x);
    // assert_eq!(bigint_to_string(&x), x.to_string());
}

However it does not perform better than the num-bigint implementation:

$ time ./target/release/bigint                        
8,91s user 0,01s system 99% cpu 8,929 total

I suspect that this is because the n.div_rem(power10) operation is slower than the corresponding python implementation. ~I might open another issue about this.~

EDIT: I created the corresponding issue, see https://github.com/rust-num/num-bigint/issues/323. I also used the rust int_divmod from https://github.com/rust-num/num-bigint/issues/323#issuecomment-2701876848 to improve on bigint_to_string, see this gist.

This new implementation is now 10 times faster than the current to_string() implementation:

$ time target/release/bigint
0,82s user 0,01s system 99% cpu 0,838 total

Mar 05 '25 15:03 vxgmichel