sp1 icon indicating copy to clipboard operation
sp1 copied to clipboard

Benchmark and accelerate `bn254` precompiles in `revm`

Open puma314 opened this issue 9 months ago • 3 comments

Background: revm uses the substrate-bn crate for implementing the bn254 pairing precompile in EVM.

We recently added a syscall sys_bigint that precompiles uint256 mulmod and can be used to accelerate bn254 computations.

  • [ ] Profile revm bn254 precompiles (add, mul, pairing) zkVM performance by writing an example program (can use one of their test vectors) that uses the various bn254 precompiles and looking at the cycle count.
  • [ ] Patch substrate-bn: https://github.com/paritytech/bn to use our sys_bigint precompile (or other existing bn254 precompiles, we have ones for add and double here) to accelerate its performance
  • [ ] Benchmark reduced cycle count of example programs

puma314 avatar May 13 '24 22:05 puma314

Also some helpful context from the folks at Nebra around whether to accelerate substrate-bn or swap in an arkworks implementation which might be easier to accelerate with sys_bigint. They already have a branch that adds a backend to arkworks with sys_bigint that can be seen here.

...[substrate-bn] seems to have montgomery representation hard-coded into it. This was also the case with the halo2 code, which was the other lib I considered. Given the mulmod syscall, which works with plain bigints (non-montgomery), it seemed like sticking with plain representation would be the best (IIRC it's only muls that benefit from Montgomery form, so it would add unnecessary overhead inside the VM if we kept that form). The arkworks impl has a "backend" component factored out, so it seemed easiest to just provide a plain (non-Montgomery) backend.

puma314 avatar May 13 '24 22:05 puma314

@puma314 Should the revm profiling example code be done here or in the revm codebase?

shadrach-tayo avatar May 18 '24 08:05 shadrach-tayo

I got some rough numbers (single sample) generated with https://github.com/m-kus/sp1-bn254-benchmark

Operation # cycles
G1 decoding (uncompressed) 10,194
G1 encoding 119,182
G1 addition 30,034
G1 multiplication 1,735,257
G2 decoding (uncompressed) 27,958,488
G2 encoding 162,243
G2 addition 104,122
G2 multiplication 31,221,329
Miller loop 33,175,157
Final exponentiation 45,323,747

Substrate BN is widely used (and will probably continue to be used since it's time/battle tested) and ZK friendliness might not be enough incentive to swap to arkworks I guess, given that it's extra work (I might be wrong though)

m-kus avatar May 21 '24 10:05 m-kus

I got similar results when using substrate-bn. I tried arkworks as is, and also patched it by using this approach (mentioned above) that relies on the mulmod precompile.

Here are the results (copied the table from @m-kus for a better comparison):

Operation substrate-bn arkworks arkworks (patched)
G1 decoding (uncompressed) 10,194 n/a n/a
G1 encoding 119,182 n/a n/a
G1 addition 30,034 17,134 3,964
G1 multiplication 1,735,257 4,494,212 (?) 918,631
G2 decoding (uncompressed) 27,958,488 n/a n/a
G2 encoding 162,243 n/a n/a
G2 addition 104,122 49,403 17,109
G2 multiplication 31,221,329 13,965,273 4,667,998
Miller loop 33,175,157 16,487,911 6,391,641
Final exponentiation 45,323,747 16,875,843 8,165,645

Note: there's a precompile for sw points addition, and when using it directly, the cycle count for "G1 addition" drops to ~1k.

I'll post a link to the benchmark repo in the next days.

brozorec avatar May 31 '24 14:05 brozorec

I was able to patch substrate-bn and switch from Montogomery form to the plain representation while keeping the same API. The final results are:

Operation substrate-bn substrate-bn-sp1 (patched)
G1 decoding (uncompressed) 10,194 2,022
G1 encoding 119,182 101,621
G1 addition 30,034 5,301
G1 multiplication 1,735,257 402,823
G2 decoding (uncompressed) 27,958,488 7,798,137
G2 encoding 162,243 117,460
G2 addition 104,122 31,819
G2 multiplication 31,221,329 8,644,149
Miller loop 33,175,157 9,449,627
Final exponentiation 45,323,747 14,574,801
revm_precompile::bn128::run_add 168,171 113,580
revm_precompile::bn128::run_mul 1,865,971 506,454
revm_precompile::bn128::run_pair 213,099,695 63,800,732

Patched substrate-bn crate: https://github.com/m-kus/substrate-bn-sp1/pull/1 Benchmark sources: https://github.com/m-kus/sp1-bn254-benchmark

m-kus avatar Jun 04 '24 18:06 m-kus

We are working on this internally...

jtguibas avatar Jun 25 '24 00:06 jtguibas