triton-vm
triton-vm copied to clipboard
U32 Table Design explorations
There are multiple ways to achieve the current functionality of the U32 Table. Let's explore the effects of different designs on both TritonVM as a whole and the recursive verifier specifically.
Re-Design Exploration 1 – Combine Permutation Arguments
Currently, there are 5 Permutation Arguments between the Processor Table and the U32 Table. The running product for all these Permutation Arguments is only updated in rows where the indicator idc
is 1.
By adding column current instruction ci
to the U32 Table, we can get away with only one Permutation Argument, which is conditional on not only on the indicator idc
but also on ci
.
Immediate upsides:
- fewer columns in U32 Table
- fewer columns in Processor Table
Immediate downsides:
- AIR of slightly higher degree for U32 Table
- AIR of slightly higher degree for Processor Table
- no re-usability of different result columns for same input
Re-Design Exploration 2 – Combine Result Columns
The Processor conditions the legal relation of two consecutive rows based on the current instruction ci
. If we add ci
as a column to the U32 Coprocessor, we can do exactly the same thing. This allows us to combine currently existing columns LT, AND, XOR, REV into a single column OP_RESULT.
Immediate upsides:
- fewer columns in U32 Table
Immediate downsides:
- considerably more complex AIR for U32 Table
Re-Design Exploration 3 – Ditch U32 Table
Instead of using a dedicated Coprocessor for u32 operations, we can enhance and use the Processor directly. This requires decomposition into bits using helper variables, which can then be immediately combined into the desired result. In conjunction with re-design exploration 4, the negative impacts are less severe by a factor of 8.
Immediate upsides:
- less high-level complexity of Triton VM
Immediate downsides:
- many, many more columns in the Processor Table
- somewhat more complex design of the Processor
Re-Design Exploration 4 – Add U8 Lookup Table
Currently, the U32 Coprocessor deals with one bit at a time. Having access to a static, pre-computed lookup table for bytes allows the U32 Coprocessor to deal with one byte at a time. Correct lookup is proven via some argument, and might require the introduction of a new argument type to Triton VM.
Immediate upsides:
- faster U32 Coprocessor
Immediate downsides:
- higher design complexity
(Originally issue number 26 in the internal issue tracker.)
(Originally posted by @bobbinth.)
In Miden, we have a separate co-processor for bitwise u32 operations: AND, OR, and XOR. These co-processor decomposes inputs into individual bits (4 bits per row), computes operations on individual bits, and then aggregates the result. So, it takes 8 rows to compute a single u32 AND/OR/XOR. As described here, it is actually possible to get rid of 2 of these 3 operations, and emulate everything else with just a single operation.
All other u32 operations we perform directly on the stack. This includes u32add, u32sub, u32mul, u32div etc. (we have 8 such operations in total). All these operations require between 2 and 4 16-bit range checks, but for simplicity we implemented all of them as consuming exactly 4 16-bit range checks.
Through Miden assembly we expose quite a few more u32 operations but all of them are emulated using one of the above operations (e.g., lt
is emulated as u32sub swap drop eqz not
).
This issue is de-facto blocked by not having a benchmarking target. See #7.
See #9 for the type of “new argument type” alluded to in option 4.
The U32 Table has been dropped for now. This issue can be re-opened if acceleration for u32 operations are deemed necessary.
Acceleration for u32 operations are deemed necessary.
@aszepieniec and I have developed the following design for a next iteration of the U32 Table. None of the functionality provided by the U32 Table changes. The new design has fewer rows but more columns, which is beneficial for overall speed.
The base columns are:
name | description |
---|---|
CopyFlag |
Should the result from this row be sent to the processor? |
LimbWidth |
How wide each of the limbs is, in bits. |
CI |
The instruction the processor is currently executing. Only relevant if CopyFlag is set. |
LHS_hi |
The high bits of the left-hand side operand. |
LHS_lo |
The low bits of the left-hand side operand. |
RHS_hi |
The high bits of the right-hand side operand. |
RHS_lo |
The low bits of the right-hand side operand. |
AND_hi |
The bit-wise and of LHS_hi and RHS_hi . |
AND_lo |
The bit-wise and of LHS_lo and RHS_lo . |
LT_hi |
Result of the less-than comparison between LHS_hi and RHS_hi . |
LT_lo |
Result of the less-than comparison between LHS_lo and RHS_lo . |
log_2_floor_hi |
The number of bits needed to represent LHS_hi minus one. |
log_2_floor_lo |
The number of bits needed to represent LHS_lo minus one. |
mul_self |
The multiplicity with which the U32 Table has looked up LHS and RHS in itself. |
mul_processor |
The multiplicity with which the processor has looked up this instruction (with the same operands.) |
The U32 Table looks up successively narrower limbs of the operands in itself, essentially creating a (stacked) multi-cascade lookup table. See § 4.2 of the Tip5 paper for an explanation of cascading lookup tables.
The last section of the U32 Table, i.e., when LimbWidth
is equal to 1, corresponds to a narrow lookup table that can be computed by the verifier in full. See § 4.3 of the Tip5 paper for an explanation of narrow lookup tables.
The rules for u32 instruction pow
are somewhat distinct from the other u32 instructions'. Concretely, the left-hand side LHS
, serving as the instruction's base, is not restricted to be in the u32 range. Consequently, the new U32 Table has a dedicated section for instruction pow
.