libtommath Add a new mp_todec

that uses Barrett reduction to speed up stringifying large integers.

Some benchmarking, done with a lightly modified demo/timing.c:

CLK_PER_SEC == 3600313344
mp_torad of 2**1 =>  22224156/sec,       162 cycles, 0.0000 seconds
mp_todec of 2**1 =>   1492667/sec,      2412 cycles, 0.0000 seconds
mp_torad of 13**1 =>  15385954/sec,       234 cycles, 0.0000 seconds
mp_todec of 13**1 =>   1503890/sec,      2394 cycles, 0.0000 seconds

mp_torad of 2**2 =>  22224156/sec,       162 cycles, 0.0000 seconds
mp_todec of 2**2 =>   1503890/sec,      2394 cycles, 0.0000 seconds
mp_torad of 13**2 =>  11765729/sec,       306 cycles, 0.0000 seconds
mp_todec of 13**2 =>   1459981/sec,      2466 cycles, 0.0000 seconds

mp_torad of 2**4 =>  14286957/sec,       252 cycles, 0.0000 seconds
mp_todec of 2**4 =>   1459981/sec,      2466 cycles, 0.0000 seconds
mp_torad of 13**4 =>   7143478/sec,       504 cycles, 0.0000 seconds
mp_todec of 13**4 =>   1273996/sec,      2826 cycles, 0.0000 seconds

mp_torad of 2**8 =>  11112078/sec,       324 cycles, 0.0000 seconds
mp_todec of 2**8 =>   1438974/sec,      2502 cycles, 0.0000 seconds
mp_torad of 13**8 =>   4348204/sec,       828 cycles, 0.0000 seconds
mp_todec of 13**8 =>    639033/sec,      5634 cycles, 0.0000 seconds

mp_torad of 2**16 =>   7143478/sec,       504 cycles, 0.0000 seconds
mp_todec of 2**16 =>   1242344/sec,      2898 cycles, 0.0000 seconds
mp_torad of 13**16 =>   2325783/sec,      1548 cycles, 0.0000 seconds
mp_todec of 13**16 =>    381712/sec,      9432 cycles, 0.0000 seconds

mp_torad of 2**32 =>   3704026/sec,       972 cycles, 0.0000 seconds
mp_todec of 2**32 =>    617337/sec,      5832 cycles, 0.0000 seconds
mp_torad of 13**32 =>   1000087/sec,      3600 cycles, 0.0000 seconds
mp_todec of 13**32 =>    216469/sec,     16632 cycles, 0.0000 seconds

mp_torad of 2**64 =>   2040993/sec,      1764 cycles, 0.0000 seconds
mp_todec of 2**64 =>    347856/sec,     10350 cycles, 0.0000 seconds
mp_torad of 13**64 =>    376680/sec,      9558 cycles, 0.0000 seconds
mp_todec of 13**64 =>    115952/sec,     31050 cycles, 0.0000 seconds

mp_torad of 2**128 =>    869640/sec,      4140 cycles, 0.0000 seconds
mp_todec of 2**128 =>    197255/sec,     18252 cycles, 0.0000 seconds
mp_torad of 13**128 =>    122484/sec,     29394 cycles, 0.0000 seconds
mp_todec of 13**128 =>     59778/sec,     60228 cycles, 0.0000 seconds

mp_torad of 2**256 =>    336729/sec,     10692 cycles, 0.0000 seconds
mp_todec of 2**256 =>    103851/sec,     34668 cycles, 0.0000 seconds
mp_torad of 13**256 =>     35017/sec,    102816 cycles, 0.0000 seconds
mp_todec of 13**256 =>     29042/sec,    123966 cycles, 0.0000 seconds

mp_torad of 2**512 =>    107305/sec,     33552 cycles, 0.0000 seconds
mp_todec of 2**512 =>     53494/sec,     67302 cycles, 0.0000 seconds
mp_torad of 13**512 =>      9245/sec,    389430 cycles, 0.0001 seconds
mp_todec of 13**512 =>     14733/sec,    244368 cycles, 0.0001 seconds

mp_torad of 2**1024 =>     30425/sec,    118332 cycles, 0.0000 seconds
mp_todec of 2**1024 =>     52900/sec,     68058 cycles, 0.0000 seconds
mp_torad of 13**1024 =>      4827/sec,    745865 cycles, 0.0002 seconds
mp_todec of 13**1024 =>     13844/sec,    260046 cycles, 0.0001 seconds

mp_torad of 2**2048 =>     15603/sec,    230742 cycles, 0.0001 seconds
mp_todec of 2**2048 =>     26176/sec,    137538 cycles, 0.0000 seconds
mp_torad of 13**2048 =>      1269/sec,   2835144 cycles, 0.0008 seconds
mp_todec of 13**2048 =>      6411/sec,    561582 cycles, 0.0002 seconds

mp_torad of 2**4096 =>      4148/sec,    867797 cycles, 0.0002 seconds
mp_todec of 2**4096 =>     12749/sec,    282384 cycles, 0.0001 seconds
mp_torad of 13**4096 =>       323/sec,  11136150 cycles, 0.0031 seconds
mp_todec of 13**4096 =>      2852/sec,   1262358 cycles, 0.0004 seconds

mp_torad of 2**8192 =>      1081/sec,   3328344 cycles, 0.0009 seconds
mp_todec of 2**8192 =>      5991/sec,    600948 cycles, 0.0002 seconds
mp_torad of 13**8192 =>        81/sec,  44357363 cycles, 0.0123 seconds
mp_todec of 13**8192 =>      1198/sec,   3004920 cycles, 0.0008 seconds

mp_torad of 2**16384 =>       274/sec,  13116996 cycles, 0.0036 seconds
mp_todec of 2**16384 =>      2677/sec,   1344762 cycles, 0.0004 seconds
mp_torad of 13**16384 =>        20/sec, 176360310 cycles, 0.0490 seconds
mp_todec of 13**16384 =>       474/sec,   7579854 cycles, 0.0021 seconds

mp_torad of 2**32768 =>        69/sec,  51832746 cycles, 0.0144 seconds
mp_todec of 2**32768 =>      1146/sec,   3139524 cycles, 0.0009 seconds
mp_torad of 13**32768 =>         5/sec, 702303642 cycles, 0.1951 seconds
mp_todec of 13**32768 =>       176/sec,  20395908 cycles, 0.0057 seconds

mp_torad of 2**65536 =>        17/sec, 205633044 cycles, 0.0571 seconds
mp_todec of 2**65536 =>       456/sec,   7885224 cycles, 0.0022 seconds
mp_torad of 13**65536 =>         1/sec, 2803178970 cycles, 0.7786 seconds
mp_todec of 13**65536 =>        57/sec,  62507790 cycles, 0.0174 seconds

mp_torad of 2**131072 =>         4/sec, 820253700 cycles, 0.2278 seconds
mp_todec of 2**131072 =>       165/sec,  21787452 cycles, 0.0061 seconds
mp_torad of 13**131072 =>         0/sec, 11200695156 cycles, 3.1110 seconds
mp_todec of 13**131072 =>        17/sec, 200245194 cycles, 0.0556 seconds

mp_torad of 2**262144 =>         1/sec, 3273670044 cycles, 0.9093 seconds
mp_todec of 2**262144 =>        53/sec,  67087368 cycles, 0.0186 seconds
mp_torad of 13**262144 =>         0/sec, 44720842806 cycles, 12.4214 seconds
mp_todec of 13**262144 =>         5/sec, 661838760 cycles, 0.1838 seconds

mp_torad of 2**524288 =>         0/sec, 13101782489 cycles, 3.6391 seconds
mp_todec of 2**524288 =>        16/sec, 217065492 cycles, 0.0603 seconds
mp_torad of 13**524288 =>         0/sec, 177629371344 cycles, 49.3372 seconds
mp_todec of 13**524288 =>         1/sec, 2334986874 cycles, 0.6486 seconds

Aug 23 '19 17:08 MasterDuke17

See #328

Aug 23 '19 17:08 MasterDuke17

:+1:

Regarding the API I think it would be best if we replace the mp_todecimal macros by specialised functions. These could either branch to a fast variant if available or branch to the generic toradix. The mp_todecimal_fast function from this PR should then be internal and named s_mp_todecimal_fast

Aug 24 '19 09:08 minad

Now that I’m thinking about it, a lot of functions have the name of the algorithm in them. Maybe make it internal and rename to s_mp_todecimal_barrett?

Sent from my iPhone

On Aug 24, 2019, at 10:44 AM, Daniel Mendler [email protected] wrote:

👍

Regarding the API I think it would be best if we replace the mp_todecimal macros by specialised functions. These could either branch to a fast variant if available or branch to the generic toradix. The mp_todecimal_fast function from this PR should then be internal and named s_mp_todecimal_fast

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Aug 24 '19 10:08 MasterDuke17

These could either branch to a fast variant if available or branch to the generic toradix.

What do you mean, "if available"? Should I change this PR to also change the mp_todecimal macro into a function that chooses mp_toradix under a certain size and chooses my new function above?

One difficulty is that mp_toradix assume it's given a buffer of the necessary size, but my function grows the buffer because mp_radix_size is so slow. However, @czurnieden said they have a new mp_radix_size that's O(1) in the works, do we need to wait until that's merged?

Aug 24 '19 12:08 MasterDuke17

If available - via the macro configuration, see #262

Aug 24 '19 15:08 minad

I would prefer if we use a radix size function and don't grow the buffer.

Aug 24 '19 15:08 minad

Why is it called todecimal?

Is there a from_decimal available?

But after a quick first glimpse: looks good. It needs some formatting and adaption to the style here and a test and documentation but otherwise…

However, @czurnieden said they have a new mp_radix_size that's O(1) in the works, do we need to wait until that's merged?

It is in a larger PR but independent of the rest of that PR. Needs to get updated to the new state of LibTomMath (more or less just the small_int_2_big_int parts, nothing significant) but no other hurdles as far as I can see. As whined about in #328 I'm stuck up to the upper edge of my lower lip in work untill the end of August, so if I get a ~~kick in the a~~thumbs-up from the regulars I'll squeeze it in this WE otherwise use mp_ilogb in the meantime.

Aug 24 '19 16:08 czurnieden

@minad

I would prefer if we use a radix size function and don't grow the buffer.

I can switch to assuming the passed in buffer is the correct size, but it'll take me a little bit of time to re-work.

@czurnieden

It needs some formatting and adaption to the style here

I tried to follow the style and it passes make astyle, what is wrong?

a test and documentation

Yep I can work on that, but I wanted confirmation I was on the right path first, so thanks.

Aug 24 '19 20:08 MasterDuke17

A style/correctness question. Should I move the 3 arrays of mp_ints that are passed around outside the functions so they don't have to be passed in every call? So

mp_int nL[20], shiftL[20], mL[20];

mp_err mp_todecimal_fast_rec(mp_int *number, int precalc_array_index, int left, char **result) {
...
}

instead of the current

mp_err mp_todecimal_fast_rec(mp_int *number, mp_int *nL, mp_int *shiftL, mp_int *mL, int precalc_array_index, int left, char **result) {
...
}

Aug 25 '19 07:08 MasterDuke17

Timings with the new version that expects to be passed a pre-allocated buffer (i.e., similar API to mp_toradix). It got a bit faster, now the crossover point with mp_toradix is closer to 2^1k than 2^2k.

CLK_PER_SEC == 3600292536
mp_torad of 2**1 =>  22224028/sec,       162 cycles, 0.0000 seconds
mp_todec of 2**1 =>   1818329/sec,      1980 cycles, 0.0000 seconds
mp_torad of 13**1 =>  14286875/sec,       252 cycles, 0.0000 seconds
mp_todec of 13**1 =>   1785859/sec,      2016 cycles, 0.0000 seconds

mp_torad of 2**2 =>  16668021/sec,       216 cycles, 0.0000 seconds
mp_todec of 2**2 =>   1852002/sec,      1944 cycles, 0.0000 seconds
mp_torad of 13**2 =>  11112014/sec,       324 cycles, 0.0000 seconds
mp_todec of 13**2 =>   1801948/sec,      1998 cycles, 0.0000 seconds

mp_torad of 2**4 =>  14286875/sec,       252 cycles, 0.0000 seconds
mp_todec of 2**4 =>   1801948/sec,      1998 cycles, 0.0000 seconds
mp_torad of 13**4 =>   7143437/sec,       504 cycles, 0.0000 seconds
mp_todec of 13**4 =>   1613034/sec,      2232 cycles, 0.0000 seconds

mp_torad of 2**8 =>  11112014/sec,       324 cycles, 0.0000 seconds
mp_todec of 2**8 =>   1801948/sec,      1998 cycles, 0.0000 seconds
mp_torad of 13**8 =>   4444805/sec,       810 cycles, 0.0000 seconds
mp_todec of 13**8 =>    766345/sec,      4698 cycles, 0.0000 seconds

mp_torad of 2**16 =>   6452137/sec,       558 cycles, 0.0000 seconds
mp_todec of 2**16 =>   1600130/sec,      2250 cycles, 0.0000 seconds
mp_torad of 13**16 =>   2299037/sec,      1566 cycles, 0.0000 seconds
mp_todec of 13**16 =>    451503/sec,      7974 cycles, 0.0000 seconds

mp_torad of 2**32 =>   4000325/sec,       900 cycles, 0.0000 seconds
mp_todec of 2**32 =>    749124/sec,      4806 cycles, 0.0000 seconds
mp_torad of 13**32 =>   1015310/sec,      3546 cycles, 0.0000 seconds
mp_todec of 13**32 =>    256102/sec,     14058 cycles, 0.0000 seconds

mp_torad of 2**64 =>   2062023/sec,      1746 cycles, 0.0000 seconds
mp_todec of 2**64 =>    406537/sec,      8856 cycles, 0.0000 seconds
mp_torad of 13**64 =>    363665/sec,      9900 cycles, 0.0000 seconds
mp_todec of 13**64 =>    141154/sec,     25506 cycles, 0.0000 seconds

mp_torad of 2**128 =>    921733/sec,      3906 cycles, 0.0000 seconds
mp_todec of 2**128 =>    230965/sec,     15588 cycles, 0.0000 seconds
mp_torad of 13**128 =>    120419/sec,     29898 cycles, 0.0000 seconds
mp_todec of 13**128 =>     74744/sec,     48168 cycles, 0.0000 seconds

mp_torad of 2**256 =>    332805/sec,     10818 cycles, 0.0000 seconds
mp_todec of 2**256 =>    126512/sec,     28458 cycles, 0.0000 seconds
mp_torad of 13**256 =>     34955/sec,    102996 cycles, 0.0000 seconds
mp_todec of 13**256 =>     38875/sec,     92610 cycles, 0.0000 seconds

mp_torad of 2**512 =>    105940/sec,     33984 cycles, 0.0000 seconds
mp_todec of 2**512 =>     67459/sec,     53370 cycles, 0.0000 seconds
mp_torad of 13**512 =>      9089/sec,    396090 cycles, 0.0001 seconds
mp_todec of 13**512 =>     19411/sec,    185472 cycles, 0.0001 seconds

mp_torad of 2**1024 =>     30141/sec,    119448 cycles, 0.0000 seconds
mp_todec of 2**1024 =>     46150/sec,     78012 cycles, 0.0000 seconds
mp_torad of 13**1024 =>      4917/sec,    732096 cycles, 0.0002 seconds
mp_todec of 13**1024 =>     19351/sec,    186048 cycles, 0.0001 seconds

mp_torad of 2**2048 =>     15745/sec,    228654 cycles, 0.0001 seconds
mp_todec of 2**2048 =>     36241/sec,     99342 cycles, 0.0000 seconds
mp_torad of 13**2048 =>      1274/sec,   2824362 cycles, 0.0008 seconds
mp_todec of 13**2048 =>      8878/sec,    405504 cycles, 0.0001 seconds

mp_torad of 2**4096 =>      4170/sec,    863334 cycles, 0.0002 seconds
mp_todec of 2**4096 =>     17672/sec,    203724 cycles, 0.0001 seconds
mp_torad of 13**4096 =>       324/sec,  11106486 cycles, 0.0031 seconds
mp_todec of 13**4096 =>      3837/sec,    938070 cycles, 0.0003 seconds

mp_torad of 2**8192 =>      1085/sec,   3315510 cycles, 0.0009 seconds
mp_todec of 2**8192 =>      8256/sec,    436068 cycles, 0.0001 seconds
mp_torad of 13**8192 =>        82/sec,  43743563 cycles, 0.0122 seconds
mp_todec of 13**8192 =>      1577/sec,   2282616 cycles, 0.0006 seconds

mp_torad of 2**16384 =>       276/sec,  13006458 cycles, 0.0036 seconds
mp_todec of 2**16384 =>      3589/sec,   1003068 cycles, 0.0003 seconds
mp_torad of 13**16384 =>        20/sec, 174191328 cycles, 0.0484 seconds
mp_todec of 13**16384 =>       643/sec,   5597280 cycles, 0.0016 seconds

mp_torad of 2**32768 =>        70/sec,  51329502 cycles, 0.0143 seconds
mp_todec of 2**32768 =>      1555/sec,   2314188 cycles, 0.0006 seconds
mp_torad of 13**32768 =>         5/sec, 695193174 cycles, 0.1931 seconds
mp_todec of 13**32768 =>       254/sec,  14172930 cycles, 0.0039 seconds

mp_torad of 2**65536 =>        17/sec, 203966838 cycles, 0.0567 seconds
mp_todec of 2**65536 =>       637/sec,   5649137 cycles, 0.0016 seconds
mp_torad of 13**65536 =>         1/sec, 2778189192 cycles, 0.7717 seconds
mp_todec of 13**65536 =>        97/sec,  36809424 cycles, 0.0102 seconds

mp_torad of 2**131072 =>         4/sec, 813089286 cycles, 0.2258 seconds
mp_todec of 2**131072 =>       250/sec,  14385618 cycles, 0.0040 seconds
mp_torad of 13**131072 =>         0/sec, 11102668398 cycles, 3.0838 seconds
mp_todec of 13**131072 =>        35/sec, 100849896 cycles, 0.0280 seconds

mp_torad of 2**262144 =>         1/sec, 3254062661 cycles, 0.9038 seconds
mp_todec of 2**262144 =>        96/sec,  37234620 cycles, 0.0103 seconds
mp_torad of 13**262144 =>         0/sec, 44394883092 cycles, 12.3309 seconds
mp_todec of 13**262144 =>        13/sec, 272151000 cycles, 0.0756 seconds

mp_torad of 2**524288 =>         0/sec, 12977733078 cycles, 3.6046 seconds
mp_todec of 2**524288 =>        35/sec, 101212992 cycles, 0.0281 seconds
mp_torad of 13**524288 =>         0/sec, 177622493220 cycles, 49.3356 seconds
mp_todec of 13**524288 =>         4/sec, 737604720 cycles, 0.2049 seconds

Aug 26 '19 20:08 MasterDuke17

@minad #262 hasn't been merged. Are you saying wait until it has? Or rebase this branch onto feature-detection2 and use its MP_HAS?

Aug 28 '19 19:08 MasterDuke17

Thanks for this PR!

Sorry for blocking you guys, I will have more time from next week on

Yes, please rebase, squash together into logical commits and use MP_HAS

Aug 29 '19 07:08 sjaeckel

... squash already happened ... :+1:

Aug 29 '19 07:08 sjaeckel

Close in favor of #331

Sep 01 '19 13:09 minad

Re-opened as #331 has its base to a non-existant branch now

Sep 02 '19 11:09 sjaeckel

c.f. https://github.com/libtom/libtommath/pull/331#issuecomment-526922246

Snprintf should not be used.

what else?

btw. I can't reproduce the error (yet)

c.f. https://github.com/libtom/libtommath/pull/331#issuecomment-526863425

how does this look?

looks fine IMO, I'll have a closer look soon

two things already now:

this function should get a size_t maxlen argument (c.f. #332)
how about leaving mp_todecimal as a macro and calling s_mp_todecimal_fast() from mp_to_radix() if the radix == 10?

I'm in the process of adding tests, but I'm not sure about documentation. How much is needed and where does it go?

c.f. docs: as it's no public API endpoint documentation is not really required, but if I could choose I'd really love to see an explanation on how the algorithm works :-)

c.f. tests: as already proposed by @minad 'You can test against mp_toradix for random numbers.' which turned into a '...could test against..., but that one is deprecated now.' - probably we could test against mp_read_radix()?

Sep 02 '19 13:09 sjaeckel

Snprintf should not be used.

Instead you should convert the number to a string "by hand". It does not make sense to call out to a very expensive (heavyweight and slow) printf function here. I want to use tommath in scenarios without libc and in particular without printf/sprintf/snprintf. They are not available on microcontrollers or if compiling to bare metal.

how about leaving mp_todecimal as a macro and calling s_mp_todecimal_fast() from mp_to_radix() if the radix == 10?

Not possible since MP_HAS is only available privately and we should keep it like this.

 c.f. docs: as it's no public API endpoint documentation is not really required, but if I could choose I'd really love to see an explanation on how the algorithm works :-)

:+1:

c.f. tests: as already proposed by @minad 'You can test against mp_toradix for random numbers.' which turned into a '...could test against..., but that one is deprecated now.' - probably we could test against mp_read_radix()?

We still have mp_toradix and I guess we will keep it? Or we will have mp_to_radix as in #332. You can write the test against mp_toradix as I have proposed.

Sep 02 '19 15:09 minad

how about leaving mp_todecimal as a macro and calling s_mp_todecimal_fast() from mp_to_radix() if the radix == 10?

Not possible since MP_HAS is only available privately and we should keep it like this.

I'm thinking about a code-path in the implementation of mp_to_radix() which would lead to not being able to test this function against mp_to_radix() as it would call itself... something like

$ git diff -U1
diff --git a/bn_mp_to_radix.c b/bn_mp_to_radix.c
index c75ee5b..2317dc4 100644
--- a/bn_mp_to_radix.c
+++ b/bn_mp_to_radix.c
@@ -29,2 +29,6 @@ mp_err mp_to_radix(const mp_int *a, char *str, size_t maxlen, int radix)
 
+   if (MP_HAS(S_MP_TODECIMAL_FAST) && radix == 10) {
+      return s_mp_todecimal_fast(a, str, maxlen);
+   }
+
    if ((err = mp_init_copy(&t, a)) != MP_OKAY) {

Sep 02 '19 16:09 sjaeckel

Snprintf should not be used.

Instead you should convert the number to a string "by hand". It does not make sense to call out to a very expensive (heavyweight and slow) printf function here. I want to use tommath in scenarios without libc and in particular without printf/sprintf/snprintf. They are not available on microcontrollers or if compiling to bare metal.

fine by me

Sep 02 '19 16:09 sjaeckel

Snprintf should not be used.

Instead you should convert the number to a string "by hand". It does not make sense to call out to a very expensive (heavyweight and slow) printf function here. I want to use tommath in scenarios without libc and in particular without printf/sprintf/snprintf. They are not available on microcontrollers or if compiling to bare metal.

fine by me

I have this implemented, but I'm currently trying to get this branch rebased onto the HEAD of develop.

c.f. docs: as it's no public API endpoint documentation is not really required, but if I could choose I'd really love to see an explanation on how the algorithm works :-)

Ha, I'm sorry, but I can't help there (other than just pointing to https://en.wikipedia.org/wiki/Barrett_reduction and hoping something there makes more sense to you than to me). I don't understand it myself, I just adopted/translated some code.

1. this function should get a `size_t maxlen` argument  (c.f. #332)

I'll work on that once I get the rebase completed.

2. how about leaving `mp_todecimal` as a macro and calling `s_mp_todecimal_fast()` from `mp_to_radix()` if the `radix == 10`?

Same here.

Sep 02 '19 20:09 MasterDuke17

Snprintf should not be used.

You can use mp_to_radix which also allows for bigger chunks at the tree-leaves to speed it up.

I'd really love to see an explanation on how the algorithm works

At first I thought it was Bouvier et al. but it is really "just" D&C (Schoenhage, Knuth, Bernstein, and probably many more but I think Arnold Schoenhage was first) and uses Barret reduction as the divsion algorithm.

Oh, and my question is still relevant: why is it called todecimal? The algorithm itself allows all bases >= 2, why the restriction to decimal? And why no fromdecimal, the algorithm works both ways?

Sep 02 '19 21:09 czurnieden

Snprintf should not be used.

You can use mp_to_radix which also allows for bigger chunks at the tree-leaves to speed it up.

At the point in the code where I was using snprintf I just have ints, not mp_ints, so I would assume the specialized code I just pushed would be faster, but could be persuaded otherwise.

Oh, and my question is still relevant: why is it called todecimal? The algorithm itself allows all bases >= 2, why the restriction to decimal? And why no fromdecimal, the algorithm works both ways?

Heh, it's just because I didn't realize it would work for other bases (and I personally was most interested in decimal). I'll try adopting all bases (and renaming to something like s_mp_to_radix_fast). Figuring out how to do it in reverse might take me longer, I'm not sure I can promise that one.

Sep 02 '19 21:09 MasterDuke17

I see I'll need to implement a max length parameter to pass on to the new mp_to_radix. Or is the suggestion in https://github.com/libtom/libtommath/pull/330#issuecomment-527201525 what I should do instead of turning a macro into a function?

Sep 02 '19 21:09 MasterDuke17

At first I thought it was Bouvier et al. but it is really "just" D&C (Schoenhage, Knuth, Bernstein, and probably many more but I think Arnold Schoenhage was first) and uses Barret reduction as the divsion algorithm.

@czurnieden I wasn't kidding when I said I'm not a mathematician. If the algorithm you linked is faster and someone implements that instead I'll be happy. I just want faster stringification in libtommath, if it comes from code I didn't write that's one less thing for me to worry about/follow in the future.

Sep 02 '19 21:09 MasterDuke17

the specialized code I just pushed would be faster, but could be persuaded otherwise.

Not a matter of persuasion more a matter of benchmarking. Using bigger chunks (200-300 bits or so, maybe more) with mp_to_radix might or might not be faster than using small chunks with some hand-optimized conversion. You need to test it out.

Figuring out how to do it in reverse might take me longer

Take your time, nobody is rushing you.

If the algorithm you linked is faster

I don't know, probably.

But I didn't link it for that reason, the link was meant as a possible explanation; it is what I thought you did the first time you mentioned it (the "without division" part to be exact) before I looked at your code and the original from Lars Hellström. Both are connected nevertheless, yours is more or less a simplified and also specialized version (base 2 to base 10 only).

if it comes from code I didn't write that's one less thing for me to worry about/follow in the future.

You started it, you have the right to finish it.

Sep 02 '19 22:09 czurnieden

how about leaving mp_todecimal as a macro and calling s_mp_todecimal_fast() from mp_to_radix() if the radix == 10?

Not possible since MP_HAS is only available privately and we should keep it like this.

@minad so you don't like the proposed diff in https://github.com/libtom/libtommath/pull/330#issuecomment-527201525? Do you want me to do to the new mp_to_decimal macro what I did with the old mp_todecimal macro (i.e., turn it into a simple wrapper function)?

Sep 03 '19 06:09 MasterDuke17

Would it be possible to get the generalized version for arbitrary bases replacing the current impl?

Sep 03 '19 11:09 minad

Would it be possible to get the generalized version for arbitrary bases replacing the current impl?

Once I get the current version renamed and accepting (and using) a maxlen parameter I'll attempt to convert to an arbitrary base implementation. However,

Both are connected nevertheless, yours is more or less a simplified and also specialized version (base 2 to base 10 only).

I'll admit I don't understand the algorithm enough the know how easy it would be to extend to the rest of the bases. Should I just extend it to 2..10?

Sep 03 '19 11:09 MasterDuke17

Would it be possible to get the generalized version for arbitrary bases replacing the current impl?

let's first get this one finished by @MasterDuke17 and then we'll have a look on how to get the generalized version, okay?

Sep 03 '19 12:09 sjaeckel

How do people feel about me creating a new PR (or aggressively cleaning up history for this one) once I have the maxlen support implemented? I think at that point it could be merged as is (with some tests and documentation) and then another new one created with a more generalized implementation.

Sep 03 '19 12:09 MasterDuke17

libtommath
libtommath copied to clipboard

Add a new mp_todec_fast...

libtommath libtommath copied to clipboard

Add a new mp_todec_fast...

libtommath
libtommath copied to clipboard