Doing multiplication into the result directly.
Sorry for not submitting a pull request, I'm working on a bignum library and I saw your implementation of multiplication so I just want to point out that you can simplify it by doing the multiply and the add in one loop into the result bn as follow:
typedef unsigned int u32;
typedef unsigned long long int u64;
void _bn_add_mul_u32(u32 *out, u32 *a, size_t a_len, u32 b)
{
u64 c = 0;
size_t i;
for(i=0; i<a_len; i++)
{
u64 a_w = a[i];
u64 out_w = out[i];
u64 o = out_w + a_w * b + c;
c = o >> 32;
out[i] = o & 0xffffffff;
}
for(; c; i++)
{
u64 o = out[i] + c;
c = o >> 32;
out[i] = o & 0xffffffff;
}
}
void _bn_mul(u32 *out, u32 *a, size_t a_len, u32 *b, size_t b_len)
{
for(size_t i=0; i<b_len; i++)
{
u32 b_w = b[i];
_bn_add_mul_u32(out+i, a, a_len, b_w);
}
}
Hey @aameen951 - sorry for the slow response.
Thank you for the tip, and no worries about not leaving a PR - a tip like this with code and all, is valuable to me :)
However, "straight from the hip" I am not completely sure what it would translate to, in my library - I would have to take some time to understand each step in your example code, and translate it.
Unfortunately, it is a task that will have to wait a bit - but maybe in my summer's vacation :)
Anyways - thanks for the contribution Ameen. I will leave this issue open and mark it as TODO/help wanted, in case anyone can easily grasp your idea, and make a PR, before I get to it.
Hi @kokke,
Your library is very useful and clear, I add one ⭐ .
However, I use calculations which require millions of calls to the bignum_* functions and I notice an extremely slow execution time (hours) whereas the same algorithm with standard types is instantaneous (one second).
Do you have any ideas like the one proposed by @aameen951 to optimize the "big" loops in your library ?
Here are my stats for one execution of my program :
-------------------------------------
| bignum function | number of calls |
-------------------------------------
| bignum_inc | 65972 |
| bignum_mul | 19856726 |
| bignum_cmp | 20516421 |
| bignum_add | 13193836 |
| bignum_divmod | 13193836 |
| bignum_from_int | 7256613 |
-------------------------------------
Greetings, JL.
Hi @JL2014 and thanks for your kind words :)
This library is using very simplistic algorithms and not much optimization at all.
Multiplication and division is usually quite slow, so that's where I would start looking.
For better performance, you could consider using GMP https://gmplib.org/ -- very clear code as well, but also very optimized and fast.
Do you have any ideas like the one proposed by @aameen951 to optimize the "big" loops in your library ?
A simple optimization would be to add a "counter"/bitmap of which words have bits set, so you can skip them. E.g. when adding 1 +1, only the least-significant-word is needed.
I think several guys have proposed PRs and optimizations in issues over the time, I just haven't had time/energy to incorporate them or even review much of it.