NTT multiplication and Newton-Raphson division
Multiplication gets maximum 800,000 times faster.
Raises Multiply size too large (ArgumentError) if size is larger than the limitation:
x * y requires [x.n_significant_digits, y.n_significant_digits].min <= 603979776
x = BigDecimal('9'*(9<<26)); x * x
# 270 days(estimated) → 29 seconds
BigDecimal('9'*(9<<27)).div(BigDecimal('7'*(9<<26)), 9<<26)
# 84 days(estimated) → 184 seconds
Basic policy of this pull request
- Make calculation fast in a wide range of precisions with relatively low amount of code and low maintenance cost
- Only use NTT and achieve O(nlogn) with a limited (but enough) maximum precision
- Less conditional branch, No Karatsuba / Toom-3,4,5 / uint128-dependency
- Don't make the code complicated for just a small constant factor speedup
NTT(Numeric Theory Translation) multiplication
Calculates multiplication/convolution using NTT with three primes.
# Calculate convolution in mod prime1, prim2 and prime3
conv1 = ntt_inverse(ntt(a, prime1).zip(ntt(b, prime1)).map{ _1 * _2 }, prime1)
conv2 = ...
conv3 = ...
# Restore actual convolution from conv1, conv2, conv3
conv = restore_convolution_from_modulo(conv1, conv2, conv3)
Consider calculating convolution of two arrays. Each array is of size N with array[i] in 0..999999999.
Maximum value of convolution[i] is 999999999**2 * N. This value is larger than 64bit and smaller than 96bit, so we need three 32-bit primes: 29<<27|1, 26<<27|1, 24<<27|1.
These are three largest 32-bit primes that satisfies P > 999999999, and P-1 need to be a multiple of large powers of two.
Constraints from this primes, maximum N is 1<<27.
Combination of primes/sizes
| BASE | Primes | N | estimated speed (smaller is better) | memo |
|---|---|---|---|---|
| 10**9 | 32bit x 3 | 1<<27 | 1 | This pull request, No digits repacking |
| 10**3 | 32bit x 1 | 1<<12 | 1 | N is too small |
| 10**6 | 32bit x 2 | 1<<24 | 1 | N is small |
| 10**6 | 64bit x 1 | 1<<24 | 2 | N is small, needs uint128_t |
| 10**14 | 64bit x 2 | 1<<34 | 12/7 | Needs uint128_t |
| 10**3 | 64bit x 1 | 1<<39 | 4 | Needs uint128_t |
Multiplication of various size bigdecimal
Considering xx_xx_xx * yy
Calculate by convolution(ntt(xx_xx_xx), ntt(00_00_yy)) is possible, but repeating convolution(ntt(xx), ntt(yy)) is faster.
# aaaa_bbbb_cccc * yyyy
ntt_y = ntt(yyyy) # Calculate once and reuse
convolution(ntt(aaaa), ntt_y)
convolution(ntt(bbbb), ntt_y)
convolution(ntt(cccc), ntt_y)
If n_significant_digits is both larger than 1<<<26 == 603979776, multiplication fails with Error(too large).
Newton-Raphson division
X / Y can be calculated by X * Yinv
and Yinv can be calculated only by add/sub/mult using Newton's method.
x = 0.1
10.times { x = x * (2 - 7 * x) }
x #=> 0.14285714285714285 (== 1/7.0)
Division of various size bigdecimal
Considering 1111_1111_1111_1111.div(7777_7777_7777)
Required precision is 4. Calculating inverse of 7777_7777_7777 in 4 digit is enough.
1111_1111_1111_1111 * 1.285e-12 == 1428
Considering 1111_1111_1111_1111.div(7777)
We can calculate this by repeating xxxx_xxxx.divmod(7777) 3 times.
Calculating inverse of 7777 in 4 digit is enough.
Generic case: Split X into several blocks
xxx_xxxxx_xxxxx_xxxxx / yyyy
Can be calculated by repeating xxxx_xxxxx.divmod(yyyy) 3 times.
xxx_xxxx_xxxx / yyyyy
Can be calculated by repeating xxxxx_xxxx.divmod(yyyyy) 2 times.
xxxxx_xxxxxxx / yyyyy
Can be calculated by repeating xxxxxx_xxxxxxx.divmod(yyyyy) 1 time.
Tests
All bigdecimal test passes with NTT_MULTIPLICATION_THRESHOLD==1 and NEWTON_RAPHSON_DIVISION_THRESHOLD==1 (All mult/div uses NTT and Newton-Raphson)
These large multiplication test passes. (Too slow to add to CI)
x = BigDecimal('9'*(9<<26))
(x*x).to_s.match?(/^0\.9+80+1e\d+$/) #=> true
ss = 20.times.map{(9<<10).times.map{rand(0..9)}.join}
x = BigDecimal((3<<16).times.map{ss.sample}.join)
y = BigDecimal((1<<16).times.map{ss.sample}.join)
prime = 33333331
(x%prime)*(y%prime)%prime == (x*y)%prime #=> true