jolt WIP: HyperKZG verifier circuit

This is a work in progress with the goal to build a complete Jolt verifier circuit and prove it with Groth16 or Polymath.

The ultimate goal is to implement #371.

HyperKZG verifier circuit

[X] define CommitmentVerifierGadget trait based on CommitmentScheme trait
- [X] look at how SNARKGadget and CommitmentGadget are structured
  - [X] VK could be modeled after Groth16VerifierGadget::ProcessedVerifyingKeyVar
- [X] design the first iteration of the gadget
[X] Implement PairingGadget We can't rely (as arkworks' PairingVar does) on the circuit native field to be the base field of the pairing curve of the HyperKZG verifier. The verifier circuit's native field is the scalar field of BN254 or BLS12-381 (as the most widely available pairing curves). Unfortunately, there are no pairing-friendly curves with the base field being the scalar field of either BN254 or BLS12-381.
- [X] Port an existing pairing (BLS12-381), from arkworks circuits
[x] implement HyperKZGVerifierGadget
[ ] test cases
- [x] ➡️ test_hyperkzg_eval
  - [X] commit using native HyperKZG
  - [X] verify using native HyperKZG
  - [x] verify in a circuit using the HyperKZGVerifierGadget
- [ ] test_hyperkzg_small
- [ ] test_hyperkzg_large

Jul 25 '24 18:07 imikushin

I'm not sure the benefit of looking at MNT4 or BLS12-381 given that our main goal is on-chain verification on Ethereum, so we need to run Groth16 over BN254?

We could just run both HyperKZG and Groth16 over BN254, but this would obviously mean that all the pairings and scalar multiplications the HyperKZG verifier does have to be done non-natively. That seems too expensive though?

Here's my back-of-the-envelope calculation: The HyperKZG verifier does 2 pairings and 2 or 3 MSMs (not sure which, would have to check) of size n where n is the size of the committed polynomial (say, 2^24). The 2 pairings done non-natively would be about 6 million constraints. If the MSMs are implemented naively (i.e., no Pippenger's, as I'm not sure how much Pippenger's helps when implemented in constraints) that's 48 scalar multiplications. Each of those is about 1 million constraints if implemented non-natively (based on the current way people handle non-native arithmetic in constraints anyway). So that's over 50 million constraints. That seems like too many constraints to run Groth16 on without a beefy machine (maybe it's several minutes of proving time with 32 threads? It's also not that much lower than the 2^27 constraint count limit for Groth16 over BN254).

Please correct me if my estimates here seem off.

Another possibility would be to do a two-step composition to get Jolt proofs on-chain: Jolt-with-HyperKZG-over-BN254 proved by Spartan-with-Hyrax-over-Grumpin proved by Groth16-over-BN254.

This would keep all arithmetic native. We would need both a Hyperkzg-over-BN254-verifier expressed as R1CS over Grumpkin, and a Hyrax-verifier-over-grumpkin-verifier expressed as an R1CS over BN254.

Jul 26 '24 20:07 GUJustin

@GUJustin the second two-step recursion system is actually how we've dealt with Nova + CycleFold proof compression to reach Ethereum verifiability. i.e., (modified)Spartan-with-Zeromorph-over-Grumpkin proved by Groth16-over-BN254. Still, we needed very beefy machines to run this process.

This kind of complexity for doing proof compression for EC-based systems is likely a good reason to move out from them, is my gut feeling

Jul 26 '24 22:07 danielmarinq

@GUJustin Definitely working towards on-chain verification, so yes, doing Groth16 over BN254. To clarify, the BLS12-381 port I'm mentioning above is for the non-native in-circuit pairing check for Jolt-with-HyperKZG-over-BLS12-381 verification proven by Groth16-over-BN254.

Choosing BLS12-381 because it has an existing R1CS circuit implementation in arkworks.

I'll update here on the non-native pairing costs when I'm done with a pairing circuit PoC.

I do like the idea of keeping arithmetic native. I'm also curious if there are other ways to do this than the two step composition.

Jul 27 '24 01:07 imikushin

This kind of complexity for doing proof compression for EC-based systems is likely a good reason to move out from them, is my gut feeling

@danielmarinq Hash-based proving systems use more layers to get to apply Groth16. I think most systems I’m aware of have three (or more?) layers of recursion to compress proofs and require beefy machines for those layers: (1) switch from “big” proof to “small” proof mode by altering rate of RS codes, (2) switch from “small” field to “big” field to match the scalar field of a curve, and (3) apply Groth16 or Plonk. I don’t think hash vs curve is affecting the complexity here and switching to hash based schemes doesn’t seem to magically solve the problem and the switch may make it somewhat harder.

Jul 27 '24 13:07 srinathsetty

Spartan-with-Zeromorph-over-Grumpkin proved by Groth16-over-BN254.

@danielmarinq how do you run zeromorph over Grumpkin? Doesn’t ZM need a pairing-friendly curve for succinct verification?

Jul 27 '24 13:07 srinathsetty

Just pushed the PoC code for the pairing gadget in the non-native field (circuit is over BN254, the pairing is BLS12-381). The results are chilling: I haven't even got to the pairing itself yet, but just allocating a G1 element is more than 2M constraints. Allocating a G2 element is ~22M more. Here's the printout from the test run (the relevant part is in generate_constraints):

····Start:   Constraint synthesis
[jolt-core/src/circuits/pairing/mod.rs:120:13] cs.num_constraints() = 0
[jolt-core/src/circuits/pairing/mod.rs:125:13] cs.num_constraints() = 2341491
[jolt-core/src/circuits/pairing/mod.rs:128:13] cs.num_constraints() = 2346643
[jolt-core/src/circuits/pairing/mod.rs:134:13] cs.num_constraints() = 2346643
[jolt-core/src/circuits/pairing/mod.rs:139:13] cs.num_constraints() = 23268489
[jolt-core/src/circuits/pairing/mod.rs:142:13] cs.num_constraints() = 25619652
[jolt-core/src/circuits/pairing/mod.rs:148:13] cs.num_constraints() = 25619652

Jul 29 '24 17:07 imikushin

Hey folks, I understand this may not be a priority, but here's an implementation of HyperKZG verification in a SNARK. To make it work, I had to offload MSMs and multi-pairings to be performed by the SNARK verifier: this dramatically reduces the number of constraints in the circuit allows building the circuit over the pairing scalar field. Enjoy 😉

Aug 22 '24 00:08 imikushin

I did cheat a little: the transcript is implemented using a mock calling the native ProofTranscript. For actual production use, the transcript should either be implemented using a circuit-friendly hash (MiMC or Poseidon) or BSB22-style commitments (like in Gnark).

Aug 22 '24 00:08 imikushin

btw, here's how you can see it in action (having checked out this PR's branch):

RUST_TEST_NOCAPTURE=1 cargo test --package jolt-core --lib circuits::poly::commitment::hyperkzg::tests::test_hyperkzg_eval

Aug 23 '24 17:08 imikushin

To make it work, I had to offload MSMs and multi-pairings to be performed by the SNARK verifier

How did you offload the MSM's?

Aug 26 '24 14:08 Forpee

@Forpee I've created a wrapper around Groth16 (other SNARK implementations, like Polymath, would also work). This wrapper uses special MSM and pairing gadgets to remember MSMs' scalars and G1 elements and pairings' G1 elements, and adds them to the public input. The verifier simply has to make sure MSMs and pairings evaluate to 0.

It's all in the code 😅

Aug 26 '24 16:08 imikushin

Just a note: for the FFA which occur in MSM there are tricks in SigmaBus that push FFA to the verifier circuit which can be accumulated in a Cyclefold instance. We will likely get all of this work into Cyclefold and then use the PSE tricks to get Cycefold on-chain.

On L1 we pay for: 1 Cyclefold instance for FFA in MSM. (could be multiple of these amortized) 1 Groth16 for native field circuit.

Sep 02 '24 13:09 wyattbenno777

@wyattbenno777 are you talking about this paper: https://eprint.iacr.org/2023/1406 ?

Also, where can I learn more about "the PSE tricks to get Cyclefold on-chain" ?

Sep 15 '24 22:09 imikushin

https://privacy-scaling-explorations.github.io/sonobe-docs/design/nova-decider-onchain.html

Sep 16 '24 12:09 Forpee

The PRs are good to review as well. There is a whole world of tricks to deal with non-native field args and proof composition.

https://github.com/privacy-scaling-explorations/sonobe/pull/132

Also the newer Sigma-paper:

https://eprint.iacr.org/2024/265

Sep 16 '24 12:09 wyattbenno777

@GUJustin @moodlezoup Are you guys still interested in this?

Sep 30 '24 18:09 imikushin

Alas, our plans to get cheap on-chain verification (and folding) no longer involve implementing the HyperKZG verifier non-natively in constraints.

Sep 30 '24 19:09 GUJustin

@GUJustin if you think this is a dead end, I'll close this PR.

But also, what would you suggest on how to move forward towards cheaper verification (and folding)?

Also, to clarify, what's implemented here doesn't involve non-native arithmetic: MSM and pairing definitions are simply appended to public input to be checked by the verifier. The circuit only deals with the scalar field (natively).

Oct 01 '24 00:10 imikushin

Also, to clarify, what's implemented here doesn't involve non-native arithmetic: MSM and pairing definitions are simply appended to public input to be checked by the verifier. The circuit only deals with the scalar field (natively).

You're saying that you're taking the HyperKZG verifier (when the committed polynomial is defined over the BN254 scalar field), and expressing it as constraints over the BN254 base field, so that all of the scalar multiplications and pairings done by the HyperKZG verifier are natively defined over the field used by the constraint system?

If there's no non-native arithmetic happening in the constraints, then the HyperKZG verifier should only require a few hundred thousand constraints at the most.

Here's the math: The HyperKZG verifier does a couple of dozen scalar multiplications and two pairings.

A scalar multiplication is ~400 group additions (actually, ~256 group additions, since we could use 128-bit scalars, but let's ignore that). Each group addition is like ~10 field multiplications. So the scalar multiplications should conservatively be about 30 \cdot 400 \cdot 30 = 120K constraints.

And a pairing is equivalent to several thousand group additions, so the two pairings also should only be 2 \cdot 2,000 \cdot 10 constraints, which is way below 100K.

So if you're seeing tens of millions of constraints, I have to think you're actually having the constraints non-natively implement the field operations done by the HyperKZG verifier. This would be needed, for example, if the constraint system is defined over the BN254 scalar field.

But also, what would you suggest on how to move forward towards cheaper verification (and folding)?

For some details, see https://jolt.a16zcrypto.com/future/groth-16.html#a-way-forward and https://github.com/a16z/jolt/blob/main/book/src/future/folding.md

Oct 01 '24 15:10 GUJustin

You're saying that you're taking the HyperKZG verifier (when the committed polynomial is defined over the BN254 scalar field), and expressing it as constraints over the BN254 base field, so that all of the scalar multiplications and pairings done by the HyperKZG verifier are natively defined over the field used by the constraint system?

No. I'm taking the HyperKZG verifier (when the committed polynomial is defined over the BN254 scalar field), and express it as constraints over the BN254 scalar field. I also take

BN254 G1 elements and scalars for MSMs,
BN254 G1 elements for the multi-pairings,

and append to the public input. The verifier must make sure the MSMs and multi-pairings evaluate to 0.

If there's no non-native arithmetic happening in the constraints, then the HyperKZG verifier should only require a few hundred thousand constraints at the most.

Exactly right! When you run

RUST_TEST_NOCAPTURE=1 cargo test --package jolt-core --lib circuits::poly::commitment::hyperkzg::tests::test_hyperkzg_eval

on this branch, you get a circuit with 112136 constraints.

Oct 02 '24 00:10 imikushin