flint Worklist for nfloat

Before I forget anything

fma and fmma that do the right thing when one term is smaller
polynomial multiplication
- block classical, karatsuba, waksman and maybe strassen with fixed-point arithmetic
- block multimodular
all _ui, _si, _fmpz variants
when the precision is more than a few limbs, have multiplication inspect the limbs to see if there are many trailing zeros -> strip off and do a normal mul (see https://github.com/fredrik-johansson/flint/commit/2b2c8a3f2c60a86fdcbea4a0dcae2233c5d1fd9f)
vec_neg
any other important missing vec functions
inv, div, sqrt, rsqrt
transcendental functions
micro-optimization: consider changing the exponent range and redefining the exponent of zero so that one can check x*y == 0 with EXP(x)+EXP(y) < MIN_EXP (saves branches detecting zeros in multiplication and in dot products)

May 27 '24 05:05 fredrik-johansson

I will just state it here as well: For inverses, I believe for small $n$ the fastest method is via Newton iteration (and based off of GMP, I suppose this extends to all numbers). Hardcoded routines for inverses could be implemented for limb counts that are powers of two, just generalizing mpn_invert_limb.

May 27 '24 10:05 albinahlback

There is also the basecase algorithm used by mpfr_divhigh_n_basecase and the variant described here: https://inria.hal.science/hal-04557431v1/document

May 27 '24 14:05 fredrik-johansson

There is also the basecase algorithm used by mpfr_divhigh_n_basecase and the variant described here: https://inria.hal.science/hal-04557431v1/document

I saw that one. I'm wondering how a fast mpn_invert joined with Granlund-Möller 2n-by-n division algorithm would compare to Sukop's and Zimmermann's new algorithm.

May 27 '24 19:05 albinahlback