flint Optimise double operations in LLL

LLL will (at least in some cases) spend a significant portion of its time in ldexp() calls appearing in check_babai, _fmpz_vec_get_d_vec_2exp and maybe elsewhere. We could speed this up by precomputing a big table of power-of-two doubles and multiplying. (In check_babai, we could just pull the ldexp calls out of the loops, but this would not work in _fmpz_vec_get_d_vec_2exp.)

This just needs to be done carefully to make sure we don't violate any assumptions about the range of the doubles that are used in LLL.

May 12 '20 07:05 fredrik-johansson

Also d_vec_dot and _d_vec_norm could be optimised by unrolling to enable more parallel multiplications. However, before implementing such a change we should verify that it doesn't slow down for short vectors.

May 12 '20 08:05 fredrik-johansson

Isn't ldexp a trivial function for "normal" inputs and outputs? It should just be a mask, move to GPR, add, move back to SIMD register, right?

Apr 01 '25 14:04 albinahlback

I already implemented some of these optimisations. See the helper functions in double_extras.

You would think that the built-in ldexp is fast, but no, not at all the case.

Apr 01 '25 15:04 fredrik-johansson

FLINT_FORCE_INLINE double d_mul_2exp(double x, int i)
{
    double_uint64_u u;
    u.f = x;
    u.i += ((uint64_t) i) << D_EXPONENT_SHIFT;
    return u.f;
}

I believe something like this would work, no?

Edit: Nevermind, I just wrote what was already written in the header.

Apr 01 '25 19:04 albinahlback