flint Low-precision multiplication of high-precision exact balls is slower than it should

As discussed in https://github.com/sagemath/sage/issues/39845:

~/co/flint:main$ cat arb_mul_exact.c
#include "flint.h"
#include "arb.h"
#include "profiler.h"

int main(void)
{
    flint_rand_t state;
    fmpz_t x;
    arb_t a, b;
    flint_rand_init(state);
    fmpz_init(x);
    arb_init(a);
    arb_init(b);
    fmpz_randbits(x, state, 1000000);
    arb_set_fmpz(a, x);
    /* arb_set_round(a, a, 256); */
    TIMEIT_START;
    arb_mul(b, a, a, 128);
    TIMEIT_STOP;
    arb_clear(a);
    arb_clear(b);
    fmpz_clear(x);
    flint_rand_clear(state);
    flint_cleanup_master();
    return 0;
}
~/co/flint:main$ gcc -I$PWD/src -L$PWD arb_mul_exact.c -lflint && LD_LIBRARY_PATH=$PWD ./a.out
cpu/wall(s): 0.00215 0.00215

The issue goes away if I remove the check that xn < MUL_MPFR_MAX_LIMBS on line 180 of arf/mul_rnd_down.c (should it be something like prec/FLINT_BITS < MUL_MPFR_MAX_LIMBS instead?).

Apr 04 '25 07:04 mezzarobba

So the issue is that it is slow due to MPFR's multiplication?

Apr 04 '25 08:04 albinahlback

No, I think it is slow because it takes the branch that does not use MPFR and that branch does a full multiplication instead of a truncated one.

Apr 04 '25 08:04 mezzarobba

Yes, this is a known performance bug: various operations including multiplication currently use the full precision inputs even when the output precision is much smaller.

Apr 04 '25 09:04 fredrik-johansson

In this particular case, since arf_mul_via_mpfr does truncate the operands to the output precision, it seems to me that arf_mul_rnd_down could test that both the largest operand and the output precision are smaller than the threshold. I am missing something?

Apr 06 '25 11:04 mezzarobba