tiny-bignum-c Doing multiplication into the result directly.

Sorry for not submitting a pull request, I'm working on a bignum library and I saw your implementation of multiplication so I just want to point out that you can simplify it by doing the multiply and the add in one loop into the result bn as follow:

typedef unsigned int           u32;
typedef unsigned long long int u64;
void _bn_add_mul_u32(u32 *out, u32 *a, size_t a_len, u32 b)
{
  u64 c = 0;
  size_t i;
  for(i=0; i<a_len; i++)
  {
    u64 a_w = a[i];
    u64 out_w = out[i];
    u64 o = out_w + a_w * b + c;
    c = o >> 32;
    out[i] = o & 0xffffffff;
  }
  for(; c; i++)
  {
    u64 o = out[i] + c;
    c = o >> 32;
    out[i] = o & 0xffffffff;
  }
}
void _bn_mul(u32 *out, u32 *a, size_t a_len, u32 *b, size_t b_len)
{
  for(size_t i=0; i<b_len; i++)
  {
    u32 b_w = b[i];
    _bn_add_mul_u32(out+i, a, a_len, b_w);
  }
}

May 11 '20 13:05 aameen951

Hey @aameen951 - sorry for the slow response.

Thank you for the tip, and no worries about not leaving a PR - a tip like this with code and all, is valuable to me :)

However, "straight from the hip" I am not completely sure what it would translate to, in my library - I would have to take some time to understand each step in your example code, and translate it.

Unfortunately, it is a task that will have to wait a bit - but maybe in my summer's vacation :)

Anyways - thanks for the contribution Ameen. I will leave this issue open and mark it as TODO/help wanted, in case anyone can easily grasp your idea, and make a PR, before I get to it.

Jun 29 '20 21:06 kokke

Hi @kokke,

Your library is very useful and clear, I add one ⭐ .

However, I use calculations which require millions of calls to the bignum_* functions and I notice an extremely slow execution time (hours) whereas the same algorithm with standard types is instantaneous (one second).

Do you have any ideas like the one proposed by @aameen951 to optimize the "big" loops in your library ?

Here are my stats for one execution of my program :

-------------------------------------
| bignum function | number of calls |
-------------------------------------
|   bignum_inc    |           65972 |
|   bignum_mul    |        19856726 |
|   bignum_cmp    |        20516421 |
|   bignum_add    |        13193836 |
|  bignum_divmod  |        13193836 |
| bignum_from_int |         7256613 |
-------------------------------------

Greetings, JL.

Nov 09 '22 11:11 JL2014

Hi @JL2014 and thanks for your kind words :)

This library is using very simplistic algorithms and not much optimization at all.

Multiplication and division is usually quite slow, so that's where I would start looking.

For better performance, you could consider using GMP https://gmplib.org/ -- very clear code as well, but also very optimized and fast.

Do you have any ideas like the one proposed by @aameen951 to optimize the "big" loops in your library ?

A simple optimization would be to add a "counter"/bitmap of which words have bits set, so you can skip them. E.g. when adding 1 +1, only the least-significant-word is needed.

I think several guys have proposed PRs and optimizations in issues over the time, I just haven't had time/energy to incorporate them or even review much of it.

Nov 29 '22 09:11 kokke