icicle double-and-add algorithm

The addition operation in projective coordinates implements operator overloading that computes a scalar multiplication using a naive double-and-add algorithm. Would be worth specializing the operation by switching it with Montgomery multiplication (CIOS) for more efficient multiplications.

Here's an efficient implementation: https://github.com/MinaProtocol/gpu-groth16-prover-3x/blob/master/multiexp/arith.cu#L289. It uses cooperative groups and thread ranks in cuda. For your codebase, it would simply require changing out the 'fixnum' calls to equivalent PTX instructions you implement in 'ptx.cuh'.

Mar 10 '23 05:03 TalDerei

noted! thanks Tal. We will improve the double-and-add. Is the Mina code implements the state of the art? btw, feel free to make a PR

cc: @DmytroTym, @mickeyasa

Mar 10 '23 07:03 omershlo

Sure! It's a stripped down version from cuda-fixnum (extended modular arithmetic library) built by Hamish from Polygon Zero. It's an older repository, but it's performant: https://github.com/data61/cuda-fixnum.

Mar 10 '23 07:03 TalDerei

I feel like two different things are at play here: modular multiplication, which is what Montgomery multiplication does, and single-scalar multiplication, which is what double-and-add does. Admittedly, we have room to improve in both. For modular multiplication, we want to see if our scheme based on Barrett reduction can be competitive with Montgomery CIOS, and if not, we'll switch to CIOS. We only have basic Barrett implemented for now. As for double-and-add, you are also right that it's suboptimal, we're planning on moving to more efficient algorithms in the near future.

Mar 17 '23 20:03 DmytroTym

To comment on this: we haven't implemented a more-efficient single-scalar multiplication than double-and-add yet, but we don't really have any use-cases for it so it's still low priority.

Jan 09 '24 07:01 DmytroTym

icicle icicle copied to clipboard

double-and-add algorithm

icicle
icicle copied to clipboard