bigdecimal icon indicating copy to clipboard operation
bigdecimal copied to clipboard

NTT multiplication and Newton-Raphson division

Open tompng opened this issue 4 months ago • 0 comments

Multiplication gets maximum 800,000 times faster.

Raises Multiply size too large (ArgumentError) if size is larger than the limitation: x * y requires [x.n_significant_digits, y.n_significant_digits].min <= 603979776

x = BigDecimal('9'*(9<<26)); x * x
# 270 days(estimated) → 29 seconds

BigDecimal('9'*(9<<27)).div(BigDecimal('7'*(9<<26)), 9<<26)
# 84 days(estimated) → 184 seconds

Basic policy of this pull request

  • Make calculation fast in a wide range of precisions with relatively low amount of code and low maintenance cost
  • Only use NTT and achieve O(nlogn) with a limited (but enough) maximum precision
  • Less conditional branch, No Karatsuba / Toom-3,4,5 / uint128-dependency
  • Don't make the code complicated for just a small constant factor speedup

NTT(Numeric Theory Translation) multiplication

Calculates multiplication/convolution using NTT with three primes.

# Calculate convolution in mod prime1, prim2 and prime3
conv1 = ntt_inverse(ntt(a, prime1).zip(ntt(b, prime1)).map{ _1 * _2 }, prime1)
conv2 = ...
conv3 = ...
# Restore actual convolution from conv1, conv2, conv3
conv = restore_convolution_from_modulo(conv1, conv2, conv3)

Consider calculating convolution of two arrays. Each array is of size N with array[i] in 0..999999999. Maximum value of convolution[i] is 999999999**2 * N. This value is larger than 64bit and smaller than 96bit, so we need three 32-bit primes: 29<<27|1, 26<<27|1, 24<<27|1. These are three largest 32-bit primes that satisfies P > 999999999, and P-1 need to be a multiple of large powers of two. Constraints from this primes, maximum N is 1<<27.

Combination of primes/sizes

BASE Primes N estimated speed (smaller is better) memo
10**9 32bit x 3 1<<27 1 This pull request, No digits repacking
10**3 32bit x 1 1<<12 1 N is too small
10**6 32bit x 2 1<<24 1 N is small
10**6 64bit x 1 1<<24 2 N is small, needs uint128_t
10**14 64bit x 2 1<<34 12/7 Needs uint128_t
10**3 64bit x 1 1<<39 4 Needs uint128_t

Multiplication of various size bigdecimal

Considering xx_xx_xx * yy Calculate by convolution(ntt(xx_xx_xx), ntt(00_00_yy)) is possible, but repeating convolution(ntt(xx), ntt(yy)) is faster.

# aaaa_bbbb_cccc * yyyy
ntt_y = ntt(yyyy) # Calculate once and reuse
convolution(ntt(aaaa), ntt_y)
convolution(ntt(bbbb), ntt_y)
convolution(ntt(cccc), ntt_y)

If n_significant_digits is both larger than 1<<<26 == 603979776, multiplication fails with Error(too large).

Newton-Raphson division

X / Y can be calculated by X * Yinv and Yinv can be calculated only by add/sub/mult using Newton's method.

x = 0.1
10.times { x = x * (2 - 7 * x) }
x #=> 0.14285714285714285 (== 1/7.0)

Division of various size bigdecimal

Considering 1111_1111_1111_1111.div(7777_7777_7777) Required precision is 4. Calculating inverse of 7777_7777_7777 in 4 digit is enough. 1111_1111_1111_1111 * 1.285e-12 == 1428

Considering 1111_1111_1111_1111.div(7777) We can calculate this by repeating xxxx_xxxx.divmod(7777) 3 times. Calculating inverse of 7777 in 4 digit is enough.

Generic case: Split X into several blocks xxx_xxxxx_xxxxx_xxxxx / yyyy Can be calculated by repeating xxxx_xxxxx.divmod(yyyy) 3 times. xxx_xxxx_xxxx / yyyyy Can be calculated by repeating xxxxx_xxxx.divmod(yyyyy) 2 times. xxxxx_xxxxxxx / yyyyy Can be calculated by repeating xxxxxx_xxxxxxx.divmod(yyyyy) 1 time.

Tests

All bigdecimal test passes with NTT_MULTIPLICATION_THRESHOLD==1 and NEWTON_RAPHSON_DIVISION_THRESHOLD==1 (All mult/div uses NTT and Newton-Raphson)

These large multiplication test passes. (Too slow to add to CI)

x = BigDecimal('9'*(9<<26))
(x*x).to_s.match?(/^0\.9+80+1e\d+$/) #=> true

ss = 20.times.map{(9<<10).times.map{rand(0..9)}.join}
x = BigDecimal((3<<16).times.map{ss.sample}.join)
y = BigDecimal((1<<16).times.map{ss.sample}.join)
prime = 33333331
(x%prime)*(y%prime)%prime == (x*y)%prime #=> true

tompng avatar Aug 20 '25 17:08 tompng