algebra icon indicating copy to clipboard operation
algebra copied to clipboard

WebGPU/Emu MSM/FFT

Open jon-chuang opened this issue 5 years ago • 3 comments

I hope to either use webgpu or emu to parallelise MSM/FFTs. I am not sure what the current implementation of MSM is, but we need to load-balance the different buckets. My target is a 2-10x speedup on a powerful GPU like GTX1060-6GB.

An idea I have which may or may not make sense is to perform the FFT in parallel on the CPU first, then when the butterflies are sufficiently small, move them into small kernels on the GPU. (can't remember which way around it is). Due to lack of data dependencies, we should be able to get very high streaming throughput.

Other problems to think about is how to package the library so that GPU can be compiled for if desired/detected. Shouldn't be too hard utilising some WebGPU/#[cfg] functionality.

jon-chuang avatar Apr 03 '20 06:04 jon-chuang

This would be great! I think starting with FFTs might be easier in the short term, just because it only requires field arithmetic. There is some prior work in the snarky library: https://github.com/o1-labs/snarky/pull/356

It could serve as a good starting point.

Pratyush avatar Apr 03 '20 07:04 Pratyush

I think we should add some auto-optimisation to decide how to split the problem between GPU and CPU/multicore or even distributed. Distributed is of value as it could lead to truly fast latencies - maybe 0.1s for offloading proving. This should be a separate issue. Can collab with RISELab maybe. I should look for some bayes opt in rust. I think probably something like Gaussian process optimisation might be fine. Should be relatively low-dimensional optimisation problem. As an alternative, the Simple(x) optimisation (rust crate) algorithm may be good.

jon-chuang avatar Apr 03 '20 07:04 jon-chuang

Heh, I'm also part of the RISElab, so that front could be somewhat easier to explore

Pratyush avatar Apr 03 '20 08:04 Pratyush