Low-precision multiplication of high-precision exact balls is slower than it should
As discussed in https://github.com/sagemath/sage/issues/39845:
~/co/flint:main$ cat arb_mul_exact.c
#include "flint.h"
#include "arb.h"
#include "profiler.h"
int main(void)
{
flint_rand_t state;
fmpz_t x;
arb_t a, b;
flint_rand_init(state);
fmpz_init(x);
arb_init(a);
arb_init(b);
fmpz_randbits(x, state, 1000000);
arb_set_fmpz(a, x);
/* arb_set_round(a, a, 256); */
TIMEIT_START;
arb_mul(b, a, a, 128);
TIMEIT_STOP;
arb_clear(a);
arb_clear(b);
fmpz_clear(x);
flint_rand_clear(state);
flint_cleanup_master();
return 0;
}
~/co/flint:main$ gcc -I$PWD/src -L$PWD arb_mul_exact.c -lflint && LD_LIBRARY_PATH=$PWD ./a.out
cpu/wall(s): 0.00215 0.00215
The issue goes away if I remove the check that xn < MUL_MPFR_MAX_LIMBS on line 180 of arf/mul_rnd_down.c (should it be something like prec/FLINT_BITS < MUL_MPFR_MAX_LIMBS instead?).
So the issue is that it is slow due to MPFR's multiplication?
No, I think it is slow because it takes the branch that does not use MPFR and that branch does a full multiplication instead of a truncated one.
Yes, this is a known performance bug: various operations including multiplication currently use the full precision inputs even when the output precision is much smaller.
In this particular case, since arf_mul_via_mpfr does truncate the operands to the output precision, it seems to me that arf_mul_rnd_down could test that both the largest operand and the output precision are smaller than the threshold. I am missing something?