Worklist for nfloat
Before I forget anything
- fma and fmma that do the right thing when one term is smaller
- polynomial multiplication
- block classical, karatsuba, waksman and maybe strassen with fixed-point arithmetic
- block multimodular
- all _ui, _si, _fmpz variants
- when the precision is more than a few limbs, have multiplication inspect the limbs to see if there are many trailing zeros -> strip off and do a normal mul (see https://github.com/fredrik-johansson/flint/commit/2b2c8a3f2c60a86fdcbea4a0dcae2233c5d1fd9f)
- vec_neg
- any other important missing vec functions
- inv, div, sqrt, rsqrt
- transcendental functions
- micro-optimization: consider changing the exponent range and redefining the exponent of zero so that one can check x*y == 0 with EXP(x)+EXP(y) < MIN_EXP (saves branches detecting zeros in multiplication and in dot products)
I will just state it here as well: For inverses, I believe for small $n$ the fastest method is via Newton iteration (and based off of GMP, I suppose this extends to all numbers). Hardcoded routines for inverses could be implemented for limb counts that are powers of two, just generalizing mpn_invert_limb.
There is also the basecase algorithm used by mpfr_divhigh_n_basecase and the variant described here: https://inria.hal.science/hal-04557431v1/document
There is also the basecase algorithm used by
mpfr_divhigh_n_basecaseand the variant described here: https://inria.hal.science/hal-04557431v1/document
I saw that one. I'm wondering how a fast mpn_invert joined with Granlund-Möller 2n-by-n division algorithm would compare to Sukop's and Zimmermann's new algorithm.