snarkVM
snarkVM copied to clipboard
Optimize masked Pedersen hashes in POSW
To achieve better amortization resistance, we should optimize the masked Pedersen hash.
Currently we use two-bit lookups for both coordinates of each generator and normal Edwards addition.
Main idea for optimization - use the Montgomery form of the Edwards curve, which would require us to use powers of the same generator for as long as we can (~50-70 powers). The main point we must take care of is that the identity point can't be added when using this form.
I think that if I compute the xor, I can do it in 3 constraints: z_i = x_i \xor p_i
final_X = z_i*((X of g_i) - (X of 1)) + (X of 1) result_Y = z_i*((Y of g_i) - (Y of 1)) + (Y of 1) and final_Y = (1-2*p_i)*result_Y
Can we apply the constraints for the generator only once outside:
For G_i = (x_i, y_i), we define variable g^y_i = y_i * (1 - 2 \rho_i) outside the circuit but inside the masked Merkle tree. Then in the lookup process for each circuit, we do:
- Take the XOR: z_i = m_i \xor \rho_i
- y_s = z_i * ((g^y_i) - e_1)) + e_1
That way we do all the checks at the beginning that set the g^y_i based on the \rho_i, saving one extra constraint inside the circuit. For X we won't have to do this since the negative/positive are the same.
For a 3-bit window with Montgomery multiplication, the case where you need to invert to counteract for the H(\rho) is less obvious.
For some 3-bit input x_1x_2_3 and \rho_1\rho_2\rho_3 we can use one constraint to check: 4x_3+2x_2+x_1 - 4\rho_3 - 2\rho_2 - \rho_1 = (4a_3 + 2a_2 + a_1) * (1 - 2a_4)
The above converts x - \rho (in 3 bits) to a 4-bit signed integer. We also need to check booleanity of the a_i, which in total gives 4 + 1 = 5 constraints for 3 bit inputs, or 1.66 constraints/bit.
The value a = a_1a_2a_3a_4 can then be fed as is into the selector, which looks up and multiplies by the respective coordinate [(1 - 2a_4)*(a_1 + 2a_2 + 4a_3)]P. We need to ensure that the case a_1 = a_2 = a_3 = 0 always multiplies by the identity (or just ignore this multiplication).
This should be usable without changing the Montgomery point multiplication nor the rest of the structure of the Pedersen hash. For the above case, we'd need 1 + 3 = 4 constraints for the sign and sum (due to the Montgomery form), we'd then also need 3 extra constraints to look up the coordinates in the 4-bit lookup, and the 5 constraints above that actually give us the index we're looking up. Overall we get 4 constraints/bit with this method.
@akattis Is this issue still relevant for us?
Architecture of the circuit hasn't changed in a way that makes this infeasible. However, I think the tradeoff between engineering time to build this and the upside isn't too great - i.e. we'll get a smaller circuit that can fit in more transactions at a given time, but the process of building it is going to be messy and complicated and we're only going to get something like a 2x speedup for a given tx number. I'd say that currently this shouldn't be a priority, but if we need to make the circuit more efficient it's a way to do so. Probably should revisit this when we have an idea of what the standard miner hardware will look like.
I agree with @akattis that we should revisit this only if we see it's necessary. I like that the current implementation is simple - it makes it more auditable and "obviously correct".
Transferred the issue to snarkVM, as posw no longer resides in snarkOS.