Properly support the underlying slot algebra in the compilation pipeline
Given #1186 and the discussion on ciphertext semantic in layout conversion passes in HEIR meeting, we might want to correctly model the slot-algebra in bgv dialect (like bgv.rotate_dim), to avoid bad lower dialect semantic affecting earlier layout decision.
We might also want to introduce earlier dialect operation like tensor_ext.rotate_dim to correspond to lower FHE scheme semantic. tensor_ext.permute can then be lowered to tensor_ext.rotate_dim.
Currently, the semantic of bgv.rotate is 1-dim rotation of all slots, which is in accordance with our programming intuition, but not (always) in accordance with the underlying slot-algebra.
For the slot-algebra, HElib has a very detailed explanation in Section 3 of Design and implementation of HElib. In short, it is the plaintext modulus t and the cyclotomic order m that determines the underlying slot-algebra via the structure of automorphism group.
-
For prime
tand power-of-twom, the automorphism group has the structure of(Zm)* = Zm/4 x Z2, then the slot can be viewed as am/4 x 2matrix. -
For prime
tand primem, the automorphism group is(Zm)* = Z(m-1), a cyclic group and the slot can be viewed as a 1 dim vector.
Current FHE libraries commonly consider these two cases, and the first one is more widely adopted. Openfhe/Lattigo only support power of two m.
- Extra note is that, even Openfhe BGV support only exposes the API of rotate, its underlying slot-algebra is still m/4 x 2. See https://github.com/google/heir/issues/1186#issuecomment-2542791299
In the meantime, our LWE type system does not rule out all possibilities of slot-algebra, so our modeling in bgv dialect should be generic enough to embrace these possibilities, or at least these two cases.
(I will just focus on BGV since the issue is the same for all SIMD schemes)
The main question I have is whether we should consider the slot algebra part of the definition of the scheme. E.g., is BGV defined such that its rotation group is a direct product of Z/2Z and Z/(N/2)Z? Or is this a choice that a lowering should make when converting BGV to {library API, polynomial}?
If we call it an inherent part of BGV, then we make bgv.rotate natively support the algebra (or change it to two ops like rotate rows/cols). In that case, a scheme that uses a different slot algebra would require a different scheme dialect. We would use dialect/op interfaces to reduce repeated work across the dialects. Then we could continue to do high-level optimizations for the shift network implementation.
If it's not inherent to BGV, then the lowering gets to decide how to implement rotation. In this case the lowering to polynomial (or to some hardware backend) might differ from the lowering to openfhe or lattigo, and we accept that by either:
- Adding higher-level ops as Hongren mentioned for algebra-aware rotations, and having separate passes that optimize for each algebra. The pass must be selected knowing the ultimate target backend. To a small extent, this bifurcates the compiler pipelines a bit more, which will add more work. To a larger worry, it will require us to add (or generalize) ops at each scheme dialect to support the more complex permutation options, even when many of these options will not be supported by most backends (they probably only support one algebra).
- Adding lower-level patterns that try to make the implemented higher-level shift network (in terms of 1D rotations) more efficient for the particular slot algebra of that backend. This will likely work but not be as optimal as reimplementing the entire shift network from scratch aware of the algebra.
- Defaulting to overhead of implementing a 1-D rotate in terms of a rotate_rows + rotate_cols with masking to reassemble. (probably not acceptable long term for efficiency)
- Defer the implementation of the shift network until the scheme-to-backend lowering step, and add
bgv.permuteops to replacebgv.rotate. This still requires separate passes as in (1), but they are localized to the backend that has support for that algebra.
Conceptually, rotation algebra probably isn't part of the BGV scheme, as the scheme doesn't even really care whether you use SIMD batching at all, so the idea of bgv.rotate might not even be well-defined for all plaintext encodings. If we want to support weird algebras in the future, I think we'd have to do so with an apply_automorphism op or something equally unpleasantly low-level.
However, from a practical point of view, we'll not have support for any of these advanced cases for quite some time, so imho it's OK to hardcode (at least for now) the power-of-two semantics used by most libraries at the bgv level (e.g., by splitting rotate into rotate_rows/rotate_cols). We can always add a verifier to make sure we're only applying them to ctxts with the crt-packed-ptxt-encoding.
EDIT: apply_automorphism will probably be necessary at some point anyway (aren't there some advanced techniques that use frobenius automorphisms to exponentiate-and-permute, which really doesn't make sense to model with rotate anymore?), and then we can just treat rotate_rows/col like "named" apply_automorphism ops.
the power-of-two semantics used by most libraries at the
bgvlevel (e.g., by splittingrotateintorotate_rows/rotate_cols).
I agree with this direction. We could first make a PR on this change.
For automorphism, one complicacy is to compute from the rotation index k to automorphism galois element 5^k % m or (-1)^k % m. As not all backends respect the cyclotomic order m generated by heir, we should still ask backend to do rotation k instead of automphism for now.
OK let's go ahead and encode rotate_{rows,cols} into BGV/BFV/CKKS scheme dialects. FYI @asraa
Maybe to ease the initial integration, we can have a rewrite pattern that implements the current rotate op in terms of rotate_rows/cols?
encode rotate_{rows,cols} into BGV/BFV/CKKS
This is only for BGV/BFV
we can have a rewrite pattern that implements the current
rotateop in terms of rotate_rows/cols?
As far as I know there is no example program that fully exploits the slots in BGV and all of them assume 1-D vector, so in my PR I just replace bgv.rotate with bgv.rotate_cols
Also, I think such rewrite should happen early at tensor_ext level?
Yes, and I think we should file an issue (or include in this issue) to provide such an example and fix the lowering from tensor_ext to bgv to support full slot usage.
This issue has 1 outstanding TODOs:
- lib/Transforms/ConvertToCiphertextSemantics/ConvertToCiphertextSemantics.cpp:161: Determine if we should support non-cyclic slot algebras here
This comment was autogenerated by todo-backlinks