num-bigint icon indicating copy to clipboard operation
num-bigint copied to clipboard

Is it normal that to_string() for bigint takes so much(8-10 mins)?

Open Moldoteck opened this issue 5 years ago • 14 comments

Well, i have an big int with ~50K digits. And when i want to print it/write it to file, to_string() takes very much. Is there a way to boost this? P.s. i'm using --release P.P.s this number is result for calculating factorial(1000000)

Moldoteck avatar Jan 20 '19 13:01 Moldoteck

Converting such a large binary number to decimal requires a lot of division by 10 (or powers of 10). I measured 560 seconds (9m20s) for this number from BigUint. For comparison, the same in Python took me 384 seconds (6m24s) -- same order of magnitude, but we do have room for improvement.

It's much faster to use hexadecimal -- less than 10ms.

cuviper avatar Jan 24 '19 01:01 cuviper

The ideea is that: When i do something like write!(filename,{:?},mynum) It is almost instant, but it writes the number as json like many numbers separated by comma. It is possible to take advantage from this and print number faster by iterating the data array?

Moldoteck avatar Jan 24 '19 06:01 Moldoteck

We don't directly expose the data array, because we may change the internal details in the future, like choosing a different word size. However, if you enable the "serde" feature of num-bigint, you can use serde_json for this: https://docs.rs/serde_json/1.0.37/serde_json/#creating-json-by-serializing-data-structures

This is quite fast, and also has the advantage that you can simply deserialize it. Any other serde format will work for this too.

cuviper avatar Jan 24 '19 18:01 cuviper

decimal conversion takes so long because bigint doesn't use an efficient algorithm. when using bc from GNU coreutils (which I think uses libgmp), printing 2^166096 (close to 10^50000) takes about a second on my smartphone.

using the Schönhage–Strassen algorithm for multiplication with Newton's method for division along with recursive subdivision for the base conversion should improve the algorithm to O(n * log(n)^j * log(log(n))^k) running time where j and k are small numbers that I won't bother to compute.

From what I recall, the currently implemented algorithm has O(n^3) running time, which is much worse (around 10000^3 operations, which would take around 300s at 3Gops/s).

programmerjake avatar Aug 25 '19 10:08 programmerjake

PRs welcome!

cuviper avatar Aug 25 '19 14:08 cuviper

decimal conversion takes so long because bigint doesn't use an efficient algorithm. when using bc from GNU coreutils (which I think uses libgmp), printing 2^166096 (close to 10^50000) takes about a second on my smartphone.

I read the source code of bc then I found that the reason why it outputs numbers so fast is that bc uses decimal digits representation internally...

chirsz-ever avatar Sep 10 '19 12:09 chirsz-ever

decimal conversion takes so long because bigint doesn't use an efficient algorithm. when using bc from GNU coreutils (which I think uses libgmp), printing 2^166096 (close to 10^50000) takes about a second on my smartphone.

I read the source code of bc then I found that the reason why it outputs numbers so fast is that bc uses decimal digits representation internally...

Well, if bc does use decimal, then all the time would be spent on converting 2^166096 to decimal in the process of calculating that power of 2, so it wouldn't be much faster.

I just tried using libgmp (through python3 with gmpy2), converting 2^33219280 (has exactly 10,000,000 decimal digits) takes around 2s on my desktop (Ryzen 5 2600).

from gmpy2 import mpz
len((mpz(1) << 33219280).digits())

I know for a fact that libgmp uses either base 2^32 or 2^64 internally.

programmerjake avatar Sep 10 '19 13:09 programmerjake

This will get fixed by #169.

tczajka avatar Oct 15 '20 04:10 tczajka

This will get fixed by #169.

Thanks 👍

Moldoteck avatar Oct 15 '20 07:10 Moldoteck

For display purposes, we often types we don't care about the exact value, but we want a visual representation of some sort.For example I'm using bigint in solang, and if the source code is:

contract C {
    uint public a = 0x42 << 0x1000;
}

The error message is:

error: value 68929666173268065441655678907297209250277640437127329505359409896740957926122792152022569587709838850498747043152991347307179846920262378079373581798190108344433160451884859890463864948869325408752252620782492450940837043400249041214374089166742716754789321974583044468704747216847828201554886533717469914441337972474197993872495771592557222079572243926566348973999761965241895651054178391269138225751793146952429238097003702352676784449795583594206767356612476729034555526070556182110034136368176157825790350818918680381187579004110590505254414061021345580607020527860296286314915887827214755800931314516484844723571229390274210314498301135085782169058787061833048081352669155166748440562902015274828003678438834758235744983872502940358864173884374362418421893050854849978562010362466242620085674985353278126304735758052063795187738140115584115393670010597080656489254414544708321337728396892217877211121649029592465037162750628525088209561398145517576740389232774117638701953386535741992764967317125237969417691252073510511769718651225882588926353299502641873735260637594404246088340952162161101156716117693793415175187367098516802584372524774806321669598085959288834521960432288439734685792771594710206781956091110750466608176562176 does not fit into type uint256.

How about a to_string_abbreviated() which prints something like:

  • 68929666173e500
  • 68929666173..(500 digits)..6608176562176
  • 68929666173..(abbreviated)..6608176562176

Not sure on the exact format - suggestions welcome.

seanyoung avatar May 24 '23 07:05 seanyoung

Would the project consider accepting a PR that changes the BigInt type to be something along the lines of

pub struct BigInt<T: BigIntFormatting = Short> {
    sign: Sign,
    data: BigUint,
    formatter: PhantomData<T>,
}

In order to provide a "fast" default that prints normally only up until an acceptable size and uses scientific notation beyond that, but users still retain the ability to opt-into the current behavior relatively easily. I find this slightly better than .to_string_abbreviated() because it protects the unaware without overly restricting them.

All of this wouldn't be needed if Rust had custom formatting specifiers, but that's not the Rust we have today :)

estebank avatar Sep 27 '23 16:09 estebank

I think a type parameter would be way too intrusive. I also think that defaulting to an abbreviated form would be a surprising change for the "unaware" on the other end of the spectrum.

cuviper avatar Sep 27 '23 16:09 cuviper

@cuviper would the prior suggestion of a new method with conditional formatting predicated on a length check be acceptable instead?

estebank avatar Sep 27 '23 17:09 estebank

Let's maybe start with an abbreviated form, and leave conditionals external for now. I fear that might still be somewhat slow to find the upper decimal digits, but maybe it can discard a lot with large divisors. I'm open to experimentation.

cuviper avatar Mar 05 '24 01:03 cuviper