num-bigint
num-bigint copied to clipboard
Is it normal that to_string() for bigint takes so much(8-10 mins)?
Well, i have an big int with ~50K digits. And when i want to print it/write it to file, to_string() takes very much. Is there a way to boost this? P.s. i'm using --release P.P.s this number is result for calculating factorial(1000000)
Converting such a large binary number to decimal requires a lot of division by 10 (or powers of 10). I measured 560 seconds (9m20s) for this number from BigUint. For comparison, the same in Python took me 384 seconds (6m24s) -- same order of magnitude, but we do have room for improvement.
It's much faster to use hexadecimal -- less than 10ms.
The ideea is that: When i do something like write!(filename,{:?},mynum) It is almost instant, but it writes the number as json like many numbers separated by comma. It is possible to take advantage from this and print number faster by iterating the data array?
We don't directly expose the data array, because we may change the internal details in the future, like choosing a different word size. However, if you enable the "serde" feature of num-bigint, you can use serde_json for this:
https://docs.rs/serde_json/1.0.37/serde_json/#creating-json-by-serializing-data-structures
This is quite fast, and also has the advantage that you can simply deserialize it. Any other serde format will work for this too.
decimal conversion takes so long because bigint doesn't use an efficient algorithm.
when using bc from GNU coreutils (which I think uses libgmp), printing 2^166096 (close to 10^50000) takes about a second on my smartphone.
using the Schönhage–Strassen algorithm for multiplication with Newton's method for division along with recursive subdivision for the base conversion should improve the algorithm to O(n * log(n)^j * log(log(n))^k) running time where j and k are small numbers that I won't bother to compute.
From what I recall, the currently implemented algorithm has O(n^3) running time, which is much worse (around 10000^3 operations, which would take around 300s at 3Gops/s).
PRs welcome!
decimal conversion takes so long because bigint doesn't use an efficient algorithm. when using bc from GNU coreutils (which I think uses libgmp), printing
2^166096(close to10^50000) takes about a second on my smartphone.
I read the source code of bc then I found that the reason why it outputs numbers so fast is that bc uses decimal digits representation internally...
decimal conversion takes so long because bigint doesn't use an efficient algorithm. when using bc from GNU coreutils (which I think uses libgmp), printing
2^166096(close to10^50000) takes about a second on my smartphone.I read the source code of bc then I found that the reason why it outputs numbers so fast is that bc uses decimal digits representation internally...
Well, if bc does use decimal, then all the time would be spent on converting 2^166096 to decimal in the process of calculating that power of 2, so it wouldn't be much faster.
I just tried using libgmp (through python3 with gmpy2), converting 2^33219280 (has exactly 10,000,000 decimal digits) takes around 2s on my desktop (Ryzen 5 2600).
from gmpy2 import mpz
len((mpz(1) << 33219280).digits())
I know for a fact that libgmp uses either base 2^32 or 2^64 internally.
This will get fixed by #169.
This will get fixed by #169.
Thanks 👍
For display purposes, we often types we don't care about the exact value, but we want a visual representation of some sort.For example I'm using bigint in solang, and if the source code is:
contract C {
uint public a = 0x42 << 0x1000;
}
The error message is:
error: value 68929666173268065441655678907297209250277640437127329505359409896740957926122792152022569587709838850498747043152991347307179846920262378079373581798190108344433160451884859890463864948869325408752252620782492450940837043400249041214374089166742716754789321974583044468704747216847828201554886533717469914441337972474197993872495771592557222079572243926566348973999761965241895651054178391269138225751793146952429238097003702352676784449795583594206767356612476729034555526070556182110034136368176157825790350818918680381187579004110590505254414061021345580607020527860296286314915887827214755800931314516484844723571229390274210314498301135085782169058787061833048081352669155166748440562902015274828003678438834758235744983872502940358864173884374362418421893050854849978562010362466242620085674985353278126304735758052063795187738140115584115393670010597080656489254414544708321337728396892217877211121649029592465037162750628525088209561398145517576740389232774117638701953386535741992764967317125237969417691252073510511769718651225882588926353299502641873735260637594404246088340952162161101156716117693793415175187367098516802584372524774806321669598085959288834521960432288439734685792771594710206781956091110750466608176562176 does not fit into type uint256.
How about a to_string_abbreviated() which prints something like:
68929666173e50068929666173..(500 digits)..660817656217668929666173..(abbreviated)..6608176562176
Not sure on the exact format - suggestions welcome.
Would the project consider accepting a PR that changes the BigInt type to be something along the lines of
pub struct BigInt<T: BigIntFormatting = Short> {
sign: Sign,
data: BigUint,
formatter: PhantomData<T>,
}
In order to provide a "fast" default that prints normally only up until an acceptable size and uses scientific notation beyond that, but users still retain the ability to opt-into the current behavior relatively easily. I find this slightly better than .to_string_abbreviated() because it protects the unaware without overly restricting them.
All of this wouldn't be needed if Rust had custom formatting specifiers, but that's not the Rust we have today :)
I think a type parameter would be way too intrusive. I also think that defaulting to an abbreviated form would be a surprising change for the "unaware" on the other end of the spectrum.
@cuviper would the prior suggestion of a new method with conditional formatting predicated on a length check be acceptable instead?
Let's maybe start with an abbreviated form, and leave conditionals external for now. I fear that might still be somewhat slow to find the upper decimal digits, but maybe it can discard a lot with large divisors. I'm open to experimentation.