snark icon indicating copy to clipboard operation
snark copied to clipboard

Remove duplicate constraints

Open ValarDragon opened this issue 4 years ago • 14 comments

If your performing exponentiation on the same variable in multiple locations in your code, you may have duplicate constraints in many places.

Ideally the backend should be able to catch these and only use a single constraint. We don't want to do this at constraint-addition, since that would be a quadratic overhead. It should be doable in nlog(n) time via a sorting algorithm on the hash of constraints.

EDIT: We could have a hashmap on the constraints, and then do it at constraint-addition time, since thats only a linear overhead.

ValarDragon avatar Feb 27 '20 02:02 ValarDragon

I'm unsure how your backed would catch these, but you could've some preparation builder presumably.

burdges avatar Jun 13 '20 06:06 burdges

My only question is whether this would actually be a concern in practice.

jon-chuang avatar Jun 18 '20 16:06 jon-chuang

For a circuit I'm working on, this saves ~20k to 30k constraints. (in general any circuit which has exponentiations of the same base to several constant exponents should get savings from this)

I imagine the back-end would catch these by maintaining a hashmap of constraints, and then everytime a constraint is added, check if its included in the hashmap.

ValarDragon avatar Jun 19 '20 05:06 ValarDragon

I see, and what is the total circuit size? If it is 1 million constraints, that's a 3% saving and substantial though not significant.

Could you link me to this circuit so I can understand how such duplicate values would arise?

jon-chuang avatar Jun 19 '20 07:06 jon-chuang

The circuit is not publicly available at the moment, but its on the order of 1-2 million constraints. These duplicate constraints arise from having to exponentiate a variable to multiple different constants in the circuit.

This seems like something thats desirable in general, since it lets you use abstractions without being concerned of duplicate constraints underneath.

ValarDragon avatar Jun 19 '20 07:06 ValarDragon

I am thinking of a design for an optimising compiler, specifically for plonk, so I think there might be some sense in developing a common frontend. Requires investigation.

The optimising compiler will have a backend component and a frontend component. The front end will optimise on the AST and IR-level as defined in some virtual machine. Then, one can perform simple things like constant subexpression elimination.

The expressiveness of Plonk (in particular turboplonk) seems to suggest to me that it is worth developing a common virtual machine interface sitting on top of the constraint system that is gate specific. The virtual machine is fully extensible in terms of adding additional turboplonk style custom gates and plookup table lookups. The virtual machine will execute program semantics defined in terms of AtomicOperations and fill in witness values defined in terms of single static assignments (this is a plonk specific abstraction for the permutation arguments).

I think that perhaps in a plonk style argument, this sort of optimisation doesn't make too much sense, as we are using the Lagrange basis, rather than the monomial basis, and duplicate var constraints actually abound, as that is the whole point of a permutation argument.

I think there may be sense in developing a common virtual machine interface that can also compile down to r1cs-style constraints.

jon-chuang avatar Jun 19 '20 07:06 jon-chuang

However, one thing that does interest me is developing turboplonk-like custom gates for R1CS.

I have a feeling this would not be hard to do and could lead to massive savings in terms of constraints. I will try to investigate how to do this with Marlin.

My first effort will be to try to implement single-gate ECC NAF addition for R1CS. I have a feeling that R1CS custom gates might be able to handle high arity gates like this much better.

jon-chuang avatar Jun 19 '20 07:06 jon-chuang

The turbo-plonk IR is really AIR, currently the only front-end for AIR I'm aware of is: https://github.com/GuildOfWeavers/AirScript, though this tracking issue from Coda seems to be a more limited starting point thats maybe closer to the VM system you have in mind: https://github.com/o1-labs/marlin/issues/45

The TurboPlonk optimization is then to run an IOP for AIR and an IOP for R1CS in parallel, and then add consistency tests for the relevant variables being the same across both representations. You can definitely swap in Marlin/Fractal for the IOP for R1CS! There are a couple of ways of doing this consistency test on the back-end, that doesn't end up being too big of a cost (especially with KZG commitment's, since you have additively homomorphic commitments). You can do the same trick you do for public input, or literally check equality of the relevant commitments on certain domains just like in rowcheck, etc.

The front-end task you would then have is to build a front end for AIR, and then mark where variables get shared between R1CS and AIR. This would be very cool to have!

ValarDragon avatar Jun 19 '20 13:06 ValarDragon

I think that perhaps in a plonk style argument, this sort of optimisation doesn't make too much sense, as we are using the Lagrange basis, rather than the monomial basis, and duplicate var constraints actually abound, as that is the whole point of a permutation argument.

Could you explain this please? I don't understand why this is true of the actual Plonk circuit. Given two wires, w1, w2, I should only have the constraint w1 * w2 once in my circuit. It would be an inefficient circuit if I did that twice (in particular all my polynomials would have larger degrees)

It happens to be the case that the SNARK back-end does something that involves copy constraints everywhere, but it would still be more expensive if I repeated constraints in my circuit representation.

ValarDragon avatar Jun 19 '20 13:06 ValarDragon

Hmm I see. For what you are suggesting, personally I would prefer to remove it at an IR level. But I guess there's no harm in doing it at the constraint level too.

Not only should one take into account constraint level equivalence, one should take into account semantic equivalence too. For instance, addition or multiplication is commutative.

I don't think you need hashes, just use SSA IDs (unique var/register assignments) and gate IDs. Just put the SSA IDs wherever commutative in a list and sort that list. And maintain a struct containing gate ID followed by an ordered list of lists. Where the order of the lists are defined for each gate.

One must then define these semantic equivalences like commutativity at the gate level.

Still I think this is very inefficient. For instance, how should one identify and remove the constraint repetition, where a is a constant:

z<-x.y x'<-a.x y'<-a.y z'<-x'.y'

x' and y' then go out of scope.

With z<-x.y z''<-a^2.z

Any standard compiler can do this at the IR level.

jon-chuang avatar Jun 19 '20 13:06 jon-chuang

Even if we had an IR with optimizations, I think this should still be a useful optimization unique to R1CS / Plonk's intermediate representation. (Or I've misunderstood how the envisioned IR works)

IR's optimize out constants in a block / scope. In R1CS / these uniform computation models, they have the benefit of there being no scope. So for instance, if I do exp(x, 16) in the beginning of my program, and exp(x, 17) at the end of my program, it is reasonable to de-duplicate the intermediate constraints x, x^2, x^4, x^8.

A normal IR wouldn't do this for you for sufficiently long problems, because those temporaries are not in scope / in a single block. (Or contrarily, 1 block is the entire program but this would be too slow for many typical per-block optimizations) Especially since removal of this re-computation can induce large overheads depending on the back-end you compile to. (For instance, if you compiled to AIR/Turbo-Plonk/GKR)

ValarDragon avatar Jun 19 '20 14:06 ValarDragon

Hmm I am inclined to agree with you. It does depend on how the IR is handled. So I think it's a good safety net worth investing in, especially since we don't control the IR, for now at least.

I'll try to incorporate it into my plonk design.

Maybe someday we will have a Zexe IR :)

jon-chuang avatar Jun 19 '20 14:06 jon-chuang

We should also keep in mind that Marlin, for example, has different optimizations. One example is the "linear combination" being a first-order citizen in R1CS, while in Marlin it might result in worse performance - you're better off adding one constraint that assigns the linear combination to a new variable and using it rather than using the same linear combination in 10 different places.

kobigurk avatar Jun 19 '20 14:06 kobigurk

Hmm I see, I guess marlin should have a gate cost model in terms of arity. It will then be easy to decide between two semantically equivalent statements.

jon-chuang avatar Jun 19 '20 14:06 jon-chuang