algebra Merge celo/zexe w/ arkworks

Features:

Batch Affine ops
Batch and ordinary w-NAF
GLV framework. GLV precomputation script.
GLV impl for both batch affine and projective mul
Batch bucket addition tree
Batched MSM (1.4-1.7x bump)
Batch subgroup verification based on addition to random buckets
12-limb .s asm impl for montgomery mul and constant-time add/sub ~~2D test matrix feature-gating with axes (curve, test type)~~ ~~Refactor tests to use common template~~ ~~Code timing instrumentation with fine grained control of functions and callers~~

Combined speedup (with w-NAF) for:

Batched mul (with GLV): 2.3-2.7x.
Non-batch projective mul (with GLV): 2.1x.
Without GLV: 1.65-1.8x and 1.4x resp.

Speedup for subgroup verification:

Over old impl 25-80x.
Over new batch mul: 10-30x.

Spec sheet (also included in code comments): https://hackmd.io/@celo/S10UeWUBD

I am not willing to break this PR into smaller PRs as that was already done in Zexe, and now I have had to repeat work to merge again to arkworks, so it's too much work, no thank you.

TODOs:

Add glv generation script.
Sister PR for curves which implement glv, cuda, test feature gating, tests for new features etc.
Feature gate for BigInt768 asm (.s rather than ff-asm).

Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why.

[ ] Targeted PR against correct branch (main)
[X] Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
[X] Wrote unit tests
[X] Updated relevant documentation in the code
[ ] Added a relevant changelog entry to the Pending section in CHANGELOG.md
[ ] Re-reviewed Files changed in the Github PR explorer

Dec 11 '20 07:12 jon-chuang

Thanks for the PR and adapting it to arkworks!

Quick question: is the GPU scalar multiplication used to implement any operations (fixed/variable base MSMs/batch subgroup checks/batch muls)? I couldn't figure it out at a quick glance.

Also, I'd really appreciate it if you could you could point out the parts that need the closest review. (Since this is a large PR, if you could suggest a "review plan" so that I can figure out how to navigate the PR, that would also be great.)

Dec 11 '20 08:12 Pratyush

Thanks for working on adapting these great changes to arkworks and upstreaming them!

I'd like to ideally split this PR into many smaller PR's for easier review, would you like me to take a stab at doing this, or would you like to?

To start with, I'd like to move the ark-serialize changes into its own PR, since that should be modular Similary, a separate PR for the bw6 ASM

Dec 11 '20 16:12 ValarDragon

Ah sorry, didn't see the comment regarding not being willing to split up this PR. Is it fine if Pratyush and I go through and split up this into smaller PRs, and review/merge the smaller ones accordingly?

Dec 11 '20 16:12 ValarDragon

Hi thanks all for the comments. I apologise that I have been feeling rather fatigued wrt to the PR due to the tediousness and also its size.

With regards to areas about which I have some doubts,

the CUDA code could be improved, maybe some thinking about how we would like to organise the code/expose interfaces/conditionally compile CUDA support. Currently, there is a lot of intermixing of locations and dependencies, and no clean separation of duties. One thing I think could make the difference is to have a CUDAKernel trait that is only a bound for relevant parameters etc when conditionally compiled with feature = "cuda". This is very satisfactory to me

To be continued.

Jan 21 '21 06:01 jon-chuang

Thanks for the PR and adapting it to arkworks!

Quick question: is the GPU scalar multiplication used to implement any operations (fixed/variable base MSMs/batch subgroup checks/batch muls)? I couldn't figure it out at a quick glance.

So, this is specifically for batch multiplication. One thing I would like to improve is to create a more general framework for incorporating GPU kernels of all sorts, (including variablebaseMSM etc), which require instantiation at the parameter level due to needing to compile for specific curves without generics.

In the future, perhaps something so ugly will not have to be relied upon, e.g. if something like https://github.com/EmbarkStudios/rust-gpu/issues/387 comes to be.

Also, I'd really appreciate it if you could you could point out the parts that need the closest review. (Since this is a large PR, if you could suggest a "review plan" so that I can figure out how to navigate the PR, that would also be great.)

I will do this.

Jan 21 '21 06:01 jon-chuang

Currently I'm restarting this PR and will break into 4 components:

batch inversion group arithmetic
glv framework and scripts
things using batch inversion-based methods, like batch verification and batch MSM
cuda batch scalar mul

Number 4 is low priority as the code is spaghetti and also the API is very uncertain.

This PR is considered number 4, and I will stage the PRs so they build atop one another. Since this introduces breaking changes with curves, the corresponding curves changes will also be made.

I've decided I will complete this PR, as sickening as it is. I've already wasted enough time on porting this to arkworks that I decided I should waste even more time to at least get it done.

I'm very sickened that instead of having to write code and get things done, I have to waste time doing stupid refactoring and changing things everywhere. What a useless waste of time.

Mar 15 '21 13:03 jon-chuang

I'm thinking of migrating the GPU segments of the code towards rust-gpu, partly because

I don't feel this segment is particularly urgent
The workarounds to use accel and also the modifications requiring maintaining a segment of it are not worthwhile.
accel = spaghetti code
rust-gpu is obviously the future ;P

So there you have it. The timeline for rust-gpu to become mature enough coincides with my imagined timeline for the development of a robust gpu component to arkworks.

The major targets are compute intensive workloads like proving (MSM), batch scalar muls (currently implemented), batch verifications (on curve checks, signatures). Currently, such functionality centers around small number of functions.

Other potential things are FFTs.

Something else I would like to look into is how to combine something like rayon with rust-gpu.

Mar 20 '21 03:03 jon-chuang

algebra algebra copied to clipboard

Merge celo/zexe w/ arkworks

algebra
algebra copied to clipboard