icicle icon indicating copy to clipboard operation
icicle copied to clipboard

split_scalars_kernel kernel function

Open TalDerei opened this issue 1 year ago • 0 comments

The for loop in this kernel can be eliminated with the integration of cooperative groups. Instead of single thread looping over all the limbs for a single scalar, multiple threads can access a different limbs (or sub-parts of the same limb) of the same scalar in parallel. This would require refactoring the arithmetic to support multi-threaded field operations. This is a longer-term optimization worth looking into, and if it's right for your codebase.

TalDerei avatar Mar 17 '23 22:03 TalDerei