algebra
algebra copied to clipboard
Optimized Pippenger
I assume that Zexe variable base multiexp is inherited from bellman. In section "New advances in multi-scalar-multiplication over elliptic curves" of this document by Aztec team it is shown that there is a 2x improvement.
Yeah, this is definitely a good thing to investigate; it would provide an immediate speed up to many downstream SNARK applications
Here is an implementation of the matter-labs team https://github.com/matter-labs/eip1962/blob/master/src/multiexp.rs, and the algo described here https://github.com/matter-labs/eip1962/blob/master/documentation/Algorithms_for_EIP1962.pdf (Algorithm 12)
Here is an implementation of the matter-labs team
It looks identical to bellman/zexe. Are you sure it uses the described trick?
Dusk PLONK uses zexe implementation, so i doubt a rust implementation already exists, only Aztec's in C++, see also
It looks identical to bellman/zexe. Are you sure it uses the described trick?
Did not check it, @swasilyev, since it was a new implementation I wanted just to share it, in case is valuable.
At least for the Edwards curves I'd think https://github.com/dalek-cryptography/curve25519-dalek/blob/master/src/backend/serial/scalar_mul/pippenger.rs comes close (see https://github.com/dalek-cryptography/curve25519-dalek/pull/249 and https://github.com/dalek-cryptography/curve25519-dalek/pull/259)
From what I understand, the impl in PLONK has some different components in the MSM, namely it directly computes with affine elements, and then does a batch inversion. I think the fork in o1-labs/Zexe has a partial impl of this, so it could be a good idea to try and upstream that
This affine based MSM with a radix sort of the bucket index followed by a vectorised addition tree is available in https://github.com/celo-org/zexe/pull/4
I assume that Zexe variable base multiexp is inherited from bellman. In section "New advances in multi-scalar-multiplication over elliptic curves" of this document by Aztec team it is shown that there is a 2x improvement.
Btw, ~2.2x is because 1.6-1.7x is due to adox/adcx. The other 1.4x is explained by the batch affine addition tree. Which we have achieved.
Results for 2^20 pippenger on 8-core Ryzen 4800H w/ 8MB L3 and 4.1GHz sustained all core boost, with SMT on: barretenberg: 520ms (490ms CPU) Zexe: 551ms.
The fat to be trimmed is not a Zexe issue, but lies in the radix sort used from an external library, which takes around 90ms compared to barretenberg's < 48ms. Once we trim the radix sort to be more cache friendly for multicore, hopefully we can close the gap.