Optimise double operations in LLL
LLL will (at least in some cases) spend a significant portion of its time in ldexp() calls appearing in check_babai, _fmpz_vec_get_d_vec_2exp and maybe elsewhere. We could speed this up by precomputing a big table of power-of-two doubles and multiplying. (In check_babai, we could just pull the ldexp calls out of the loops, but this would not work in _fmpz_vec_get_d_vec_2exp.)
This just needs to be done carefully to make sure we don't violate any assumptions about the range of the doubles that are used in LLL.
Also d_vec_dot and _d_vec_norm could be optimised by unrolling to enable more parallel multiplications. However, before implementing such a change we should verify that it doesn't slow down for short vectors.
Isn't ldexp a trivial function for "normal" inputs and outputs? It should just be a mask, move to GPR, add, move back to SIMD register, right?
I already implemented some of these optimisations. See the helper functions in double_extras.
You would think that the built-in ldexp is fast, but no, not at all the case.
FLINT_FORCE_INLINE double d_mul_2exp(double x, int i)
{
double_uint64_u u;
u.f = x;
u.i += ((uint64_t) i) << D_EXPONENT_SHIFT;
return u.f;
}
I believe something like this would work, no?
Edit: Nevermind, I just wrote what was already written in the header.