sp1 Benchmark and accelerate `bn254` precompiles in `revm`

Background: revm uses the substrate-bn crate for implementing the bn254 pairing precompile in EVM.

We recently added a syscall sys_bigint that precompiles uint256 mulmod and can be used to accelerate bn254 computations.

[ ] Profile revm bn254 precompiles (add, mul, pairing) zkVM performance by writing an example program (can use one of their test vectors) that uses the various bn254 precompiles and looking at the cycle count.
[ ] Patch substrate-bn: https://github.com/paritytech/bn to use our sys_bigint precompile (or other existing bn254 precompiles, we have ones for add and double here) to accelerate its performance
[ ] Benchmark reduced cycle count of example programs

May 13 '24 22:05 puma314

Also some helpful context from the folks at Nebra around whether to accelerate substrate-bn or swap in an arkworks implementation which might be easier to accelerate with sys_bigint. They already have a branch that adds a backend to arkworks with sys_bigint that can be seen here.

...[substrate-bn] seems to have montgomery representation hard-coded into it. This was also the case with the halo2 code, which was the other lib I considered. Given the mulmod syscall, which works with plain bigints (non-montgomery), it seemed like sticking with plain representation would be the best (IIRC it's only muls that benefit from Montgomery form, so it would add unnecessary overhead inside the VM if we kept that form). The arkworks impl has a "backend" component factored out, so it seemed easiest to just provide a plain (non-Montgomery) backend.

May 13 '24 22:05 puma314

@puma314 Should the revm profiling example code be done here or in the revm codebase?

May 18 '24 08:05 shadrach-tayo

I got some rough numbers (single sample) generated with https://github.com/m-kus/sp1-bn254-benchmark

Operation	# cycles
G1 decoding (uncompressed)	10,194
G1 encoding	119,182
G1 addition	30,034
G1 multiplication	1,735,257
G2 decoding (uncompressed)	27,958,488
G2 encoding	162,243
G2 addition	104,122
G2 multiplication	31,221,329
Miller loop	33,175,157
Final exponentiation	45,323,747

Substrate BN is widely used (and will probably continue to be used since it's time/battle tested) and ZK friendliness might not be enough incentive to swap to arkworks I guess, given that it's extra work (I might be wrong though)

May 21 '24 10:05 m-kus

I got similar results when using substrate-bn. I tried arkworks as is, and also patched it by using this approach (mentioned above) that relies on the mulmod precompile.

Here are the results (copied the table from @m-kus for a better comparison):

Operation	substrate-bn	arkworks	arkworks (patched)
G1 decoding (uncompressed)	10,194	n/a	n/a
G1 encoding	119,182	n/a	n/a
G1 addition	30,034	17,134	3,964
G1 multiplication	1,735,257	4,494,212 (?)	918,631
G2 decoding (uncompressed)	27,958,488	n/a	n/a
G2 encoding	162,243	n/a	n/a
G2 addition	104,122	49,403	17,109
G2 multiplication	31,221,329	13,965,273	4,667,998
Miller loop	33,175,157	16,487,911	6,391,641
Final exponentiation	45,323,747	16,875,843	8,165,645

Note: there's a precompile for sw points addition, and when using it directly, the cycle count for "G1 addition" drops to ~1k.

I'll post a link to the benchmark repo in the next days.

May 31 '24 14:05 brozorec

I was able to patch substrate-bn and switch from Montogomery form to the plain representation while keeping the same API. The final results are:

Operation	substrate-bn	substrate-bn-sp1 (patched)
G1 decoding (uncompressed)	10,194	2,022
G1 encoding	119,182	101,621
G1 addition	30,034	5,301
G1 multiplication	1,735,257	402,823
G2 decoding (uncompressed)	27,958,488	7,798,137
G2 encoding	162,243	117,460
G2 addition	104,122	31,819
G2 multiplication	31,221,329	8,644,149
Miller loop	33,175,157	9,449,627
Final exponentiation	45,323,747	14,574,801
`revm_precompile::bn128::run_add`	168,171	113,580
`revm_precompile::bn128::run_mul`	1,865,971	506,454
`revm_precompile::bn128::run_pair`	213,099,695	63,800,732

Patched substrate-bn crate: https://github.com/m-kus/substrate-bn-sp1/pull/1 Benchmark sources: https://github.com/m-kus/sp1-bn254-benchmark

Jun 04 '24 18:06 m-kus

We are working on this internally...

Jun 25 '24 00:06 jtguibas

sp1 sp1 copied to clipboard

Benchmark and accelerate `bn254` precompiles in `revm`

sp1
sp1 copied to clipboard