Implement and use middle products (basecase, Karatsuba and FFT-based)
This is relevant e.g. for Newton iteration and divide-and-conquer for series solutions to some linear ODEs.
Here are some references:
- A. Bostan, G. Lecerf, E. Schost, Tellegen's Principle into Practice, ISSAC '03: Proceedings of the 2003 international symposium on Symbolic and algebraic computation, 37–44.
- G. Hanrot, M. Quercia, P. Zimmermann, The Middle Product Algorithm, I. Speeding up the division and square root of power series, Applicable Algebra in Engineering, Communication and Computing, 2004, 14 (6), 415–438.
- J. van der Hoeven, Newton’s method and FFT trading, Journal of Symbolic Computation, Volume 45, Issue 8, 2010, 857–878.
- D. Harvey, Faster algorithms for the square root and reciprocal of power series, Math. Comp. 80 (2011), 387–394.
I learned about some of these references from @mezzarobba.
Keywords for search: mulmid, acb_poly_mulmid, arb_poly_mulmid.
- Part of https://github.com/flintlib/flint/issues/1650
Note that the simplest, balanced version of a middle product would compute the middle n coefficients of a 2n x n product. A more useful and general version would compute the coefficients in the range [clo, chi) of a general an x bn product. Depending on the parameters, this general middle product can degenerate into a balanced middle product, a full product, a low or high product, or a dot product, or something in between these regimes. Obviously this makes the tuning more involved.
Note that _nmod_poly_mul_mid_mpn_ctx already implements the general version of middle product, though it does FFT only. There is some skeleton code in fft_small/nmod_poly_mul.c to deal with some unbalanced cases, but it's incomplete and currently unused.
Fine-tuning information concerning _nmod_poly_mul_mid_mpn_ctx. In fact it's as much information as a question: is the following correct?
It seemed to me that due to the absence of transposed version of the truncated FFT and inverse truncated FFT in fft_small, the current implementation of the middle product can waste some factor, between 1 and 2, depending on the parameters (and comparing to what we could hope for if we had transposed variants as fast as the non-transposed ones). This is partly compensated by not using too long FFTs when there is a good wrap-around, but that only applies to some suitable parameters.