Fix logic mismatch in `newR1C` for better Groth16 optimization
Previously, the comment suggested that the goal was to minimize the number of variables in the B matrix, but the implementation did the opposite — swapping L and R when L was longer. This PR corrects the condition to match the intended optimization logic, aligning with Groth16 performance best practices.
Fixes # (no issue number yet)
Type of change
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
How has this been tested?
- [x] Manual inspection of Groth16 output matrices
- [ ] Unit test (TBD - logic is straightforward but may benefit from regression test)
@ThomasPiellard can you check that one? I recall empirically, the code as is (before this PR) actually produce better perf
Quick heads-up on PR #1482 (“fix(update): Correct logic mismatch in newR1C for better Groth16 optimization”):
- Align the swap condition in
newR1Cwith the intended goal of minimizing B-matrix variables - Previously swapped L and R when L was longer; now only swaps when that actually reduces variable count
- Matches Groth16 performance best practices
PTAL @ThomasPiellard—let me know if empirical tests suggest otherwise! ⚙️
@gbotrel, @ThomasPiellard, @tomasandroil
I implemented following benchmark (in groth recursion package):
func BenchmarkBN254Commitment(b *testing.B) {
assert := test.NewAssert(b)
innerCcs, innerVK, innerWitness, innerProof := getInnerCommitment(assert, ecc.BN254.ScalarField(), ecc.BN254.ScalarField())
assert.Equal(len(innerCcs.GetCommitments().CommitmentIndexes()), 1)
// outer proof
circuitVk, err := ValueOfVerifyingKey[sw_bn254.G1Affine, sw_bn254.G2Affine, sw_bn254.GTEl](innerVK)
assert.NoError(err)
circuitWitness, err := ValueOfWitness[sw_bn254.ScalarField](innerWitness)
assert.NoError(err)
circuitProof, err := ValueOfProof[sw_bn254.G1Affine, sw_bn254.G2Affine](innerProof)
assert.NoError(err)
outerCircuit := &OuterCircuit[sw_bn254.ScalarField, sw_bn254.G1Affine, sw_bn254.G2Affine, sw_bn254.GTEl]{
Proof: PlaceholderProof[sw_bn254.G1Affine, sw_bn254.G2Affine](innerCcs),
InnerWitness: PlaceholderWitness[sw_bn254.ScalarField](innerCcs),
VerifyingKey: PlaceholderVerifyingKey[sw_bn254.G1Affine, sw_bn254.G2Affine, sw_bn254.GTEl](innerCcs),
}
outerAssignment := &OuterCircuit[sw_bn254.ScalarField, sw_bn254.G1Affine, sw_bn254.G2Affine, sw_bn254.GTEl]{
InnerWitness: circuitWitness,
Proof: circuitProof,
VerifyingKey: circuitVk,
}
r1cs, err := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, outerCircuit)
assert.NoError(err)
witness, err := frontend.NewWitness(outerAssignment, ecc.BN254.ScalarField())
assert.NoError(err)
pk, err := groth16.DummySetup(r1cs)
assert.NoError(err)
for b.Loop() {
_, err := groth16.Prove(r1cs, pk, witness, GetNativeProverOptions(ecc.BN254.ScalarField(), ecc.BN254.ScalarField()))
assert.NoError(err)
}
}
Ran the benchmarks:
go test -benchmem -run=^$ -bench ^BenchmarkBN254Commitment$ github.com/consensys/gnark/std/recursion/groth16 -count=2 -benchtime=3x
Then the results are:
- old
goos: linux
goarch: amd64
pkg: github.com/consensys/gnark/std/recursion/groth16
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
BenchmarkBN254Commitment-16 3 4439002251 ns/op 2403417824 B/op 10357157 allocs/op
BenchmarkBN254Commitment-16 3 4393044997 ns/op 2397711322 B/op 10354902 allocs/op
PASS
ok github.com/consensys/gnark/std/recursion/groth16 111.366s
- new
goos: linux
goarch: amd64
pkg: github.com/consensys/gnark/std/recursion/groth16
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
BenchmarkBN254Commitment-16 3 4069276560 ns/op 2141616477 B/op 10352891 allocs/op
BenchmarkBN254Commitment-16 3 4073187323 ns/op 2147236154 B/op 10356136 allocs/op
PASS
ok github.com/consensys/gnark/std/recursion/groth16 110.955s
And diff (no confidence as low number of runs, but we see the picture):
goos: linux
goarch: amd64
pkg: github.com/consensys/gnark/std/recursion/groth16
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
BN254Commitment-16 4.416 ± ∞ ¹ 4.071 ± ∞ ¹ ~ (p=0.333 n=2) ²
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
BN254Commitment-16 2.236Gi ± ∞ ¹ 1.997Gi ± ∞ ¹ ~ (p=0.333 n=2) ²
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
BN254Commitment-16 10.36M ± ∞ ¹ 10.35M ± ∞ ¹ ~ (p=0.667 n=2) ²
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
TLDR: I think it is good to merge. I'll only push a few cosmetic changes (documentation alignment).