solana-perf-libs icon indicating copy to clipboard operation
solana-perf-libs copied to clipboard

Potential sigverify latency optimization

Open sakridge opened this issue 5 years ago • 0 comments

The ed25519 sigverify check does the operation a* A + b * B in a single thread. This is somewhat efficient for the CPU because it saves instructions and stack spill to L1 is not as expensive on CPU. On GPU, since there are so many threads, one could do a *A with one kernel launch and in parallel do b * B. At the end, then do the addition which is pretty cheap. Each of those launches would then use a larger portion of the GPU, but in low-batch situations I think this is preferable to letting a large part of the GPU go to waste. Each scalar multiply would also have much less register pressure since it only has half the temps to deal with.

One might even want to have both options available in case the GPU encounters large vs. small batch if one is more efficient than the other.

sakridge avatar Sep 15 '20 17:09 sakridge