snark
snark copied to clipboard
Small Vec Linear Combinations
Summary
As currently written, a symbolic LC is created on every operation. We should expect that most Symbolic LC's are small, and probably should try to keep things on stack via small vec optimizations, perhaps using https://github.com/servo/rust-smallvec.
We definitely shouldn't switch without a more detailed benchmark suite (getting average symbolic LC size, seeing how much of the time is spent in mallocs, seeing if things actually get faster etc.)
For Admin Use
- [ ] Not duplicate issue
- [ ] Appropriate labels applied
- [ ] Appropriate contributors tagged
- [ ] Contributor assigned/self-assigned
A separate proposal would be to avoid the generation of too many symbolic LC.
Currently, each AllocatedFp is associated with a Variable in the constraint system. Thus, mul_by_constant and add would all create a new symbolic LC.
One potential direction is to let AllocatedFp uses a linear combination of variables instead of a symbolic LC, if it is optimized for the number of constraints.
One potential direction is to let AllocatedFp uses a linear combination of variables instead of a symbolic LC, if it is optimized for the number of constraints.
I think this is a great idea. The alternative idea I was thinking of was to introduce drop semantics to variables, but that was growing in complexity very rapidly. (We may still need something of that form to remove symbolic LC's that are intermediate computation steps though, in the case of density optimizations)
One potential direction is to let AllocatedFp uses a linear combination of variables instead of a symbolic LC, if it is optimized for the number of constraints.
One way to do this is to change the Variable
enum as follows:
pub enum Variable<F> {
/// Represents the "zero" constant.
Zero,
/// Represents of the "one" constant.
One,
/// Represents a public instance variable.
Instance(usize),
/// Represents a private witness variable.
Witness(usize),
/// Represents of a linear combination.
SymbolicLc(InlineOrIndex<F>),
}
// Needed if we want to make this an implementation detail.
pub struct InlineOrIndex<F>(InlineOrIndexInner<F>);
enum InlineOrIndexInner<F> {
Index(usize),
Lc(StackVec<(F, Variable<F>)>), // Not sure if this needs to be `Box`
}
The downside of this approach is that Variable
is no longer Copy
, but can only be Clone
; this is a breaking change. Another breaking change is the addition of the Field
type parameter F
.
Separately, one might want to reduce the memory consumption of Variable
by replacing usize
with u32
. We could also try u48
if u32
is too small; this can be implemented as a new type around [u8; 6]
or [u16; 3]
or (u32, u16)
.
Along with coefficient-interning (#122), this could lead to a significant memory improvement.
I feel like we will need to eventually remove Copy
from Variable anyway. Even in the case of optimizing for density, I think we want to add some drop semantics so that we can prune intermediate, only used once, symbolic LC's from memory. (Which then necessitates removing copy)
Separately, one might want to reduce the memory consumption of Variable by replacing usize with u32. We could also try u48 if u32 is too small; this can be implemented as a new type around [u8; 6] or [u16; 3] or (u32, u16).
Its not clear to me that that is going to be reducing memory under the inline or index API. Symbolic LC could still be storing an LC
, which is a vec, which AFAIU takes the same size as 3 * sizeof(usize)
. (So the stack size taken would increase)
In the case where it doesn't have inline or index
, its still not clear to me that it reduces size, since the enum itself has memory overhead AFAIU. (One byte for under 256 options, but I don't know how this is aligned)
I think changing FpVar
to directly use a linear combination in both cases may work, since the LC can just store the symbolic variable (which is equivalent to what is already happening)
Its not clear to me that that is going to be reducing memory under the inline or index API. Symbolic LC could still be storing an LC, which is a vec, which AFAIU takes the same size as 3 * sizeof(usize). (So the stack size taken would increase)
You're right, that optimization is not useful. Separately, to compress the size for SymbolicLC
, we could Box
that particular variant of the enum, which should bring the size down to size_of::<usize>()
.
I think changing FpVar to directly use a linear combination in both cases may work, since the LC can just store the symbolic variable (which is equivalent to what is already happening)
The reason I wanted to put this in ConstraintSystem
and not just in FpVar
is to enable this for other gadgets too.
In the case of density optimizations, I think we want to keep building it up in LC
representation, and condense that into a saved Symbolic LC
every time that variable is used in a constraint. That approach has the problem of many operation may mutate both argument. (e.g. a *= b
mutates b
).
An alternative to that is that we introduce total_num_refs
+ drop semantics to Variable. Every time a variable is cloned we increment its total_num_refs
. If a variable is dropped and its total_num_refs
was 1, we know it was an intermediate variable, and it can be pruned from the symbolic LC map.
(This approach should still have worse performance than the first approach, since we save the symbolic LC and remove it later, instead of just never saving it in the first place)
Copying over from another issue:
I think there are two approaches:
-
We have linear combinations either in the variable directly, or in the
lc_map
, but not in both. This way, if we inline LCs, then when the variable goes out of scope, the memory for the LC is freed immediately, and we don't have to clean up thelc_map
. -
We hold a
Rc<RefCell<LcMap>>
or something inside aSymbolicLc
, and when theSymbolicLc
gets dropped, we look inside theLcMap
and do some garbage collection there. Not sure how expensive this would be, though. This latter approach is similar to your approach
Separately, it would also be worthwhile to consider what can be done for density optimizations; the foregoing mostly only makes sense when optimizing for constraints.
If a variable is dropped and its total_num_refs was 1, we know it was an intermediate variable, and it can be pruned from the symbolic LC map.
Every variable is eventually dropped, so we can't necessarily prune, if we want to have the "outlining" optimization available to use.
Every variable is eventually dropped, so we can't necessarily prune, if we want to have the "outlining" optimization available to use.
Err total_num_refs
was probably the wrong name. I meant it as max_num_refs
, e.g. if there was ever more than 1. On further thought, this should instead be a boolean. (Perhaps re_used_lc
as the name) And then every time a constraint is created from an LC, that lc gets saved to the LC map.