Albin Ahlbäck
Albin Ahlbäck
I don't know. If people have a problem with it, I'm down for changing it.
Seems like a reasonable fix before releasing 3.2.0. However, we would have to make sure that this is consistent throughout FLINT. Moreover, it seems like some docstring do not provide...
Related to #2094
> Quick timings: > > ... > > Looks worth extending a bit beyond 16. Perhaps I have to fine tune the thing first. Seems a little bit slow, but...
> By the way, we often call `mpn_aors_n(rp, xp, yp, n)` with the same operand for `rp` and `xp`. Would functions specialized for this case gain anything? For ARM, no....
> BTW: I think I mentioned this before: add_sssss.... macros seem to be inefficient for large n because the compiler doesn't know how to interleave the move and add instructions....
Keep in mind that the speedups for addition is not too significant in itself. The multiplication is a more important routine to optimize since its heavy, where optimizing it can...
Note to self: If we want n > 16, we need to shift `ap`, `bp` and `rp` so that we do not get larger instructions. With offsets larger than 127,...
This will fail on ARM, but now `aorsrsh` is implemented for x86.
But does it really yield improvements? I don't know if I would be happy having these merged.