constantine
constantine copied to clipboard
Implement Arbitrary Degree Karatsuba
The embedding curves for Snarks proof composition like BW6-761 have a large number of limbs (2x the embedded curve) by necessity. This makes the n² of schoolbook multiplication quite costly. Karatsuba and variants have a cost of n(n+1)/2 instead and so that factor scales twice as slow.
However plain Karatsuba requires the number of limbs be a power of 2. Scott 2015 reminds and gives practical algorithm to implement an arbitrary degree Karatsuba that captures most Karatsuba performance benefits while being flexible on the number of limbs, hence it can be applied to 12 limbs field like BW6-761.

Alternatively would it be possible to have one layer of Karatsuba multiplication for 6x6 -> 12 limbs and for the 6 limbs multiplications use the currently implemented fast methods?
References
- Missing a trick: Karatsuba variations
Mike Scott, 2015
https://eprint.iacr.org/2015/1247
Is this friendly to adcx/adox?
It's likely something I will explore a long time for now ;) but since I came across the paper again I just wanted to log my thoughts.
That said at first glance I expect the substraction will pollute the CF and OF flags so it will likely benefit only from mulx but not ADCX and ADOX.