HElib
HElib copied to clipboard
reduce the number of DoubleCRT constants in unpack function
PR for this issue.
Unfortunately, the current code increases the number of frobeniusAutomorph
(i.e., O(d^2)) times.
(Perhaps with the Baby-step/Giant-step tech, we can reduce it to O(d \sqrt(d)))
Even with hoisting, I found it slower than previous version.
Following is slow due to O(d^2) frobeniusAutomorph;
DoubleCRT coeff(unpackSlotEncoding.front(), ctxt.getContext(), ctxt.getPrimeSet());
Ctxt tmp1(ZeroCtxtLike, ctxt);
for (long i = 0; i < unpacked.size(); i++) {
/// accumulative frobenius: frobenius(v, 2) = frobenius(frobenius(v, 1), 1)
if (i > 0)
coeff.automorph(t % m);
*(unpacked[i]) = ctxt;
unpacked[i]->multByConstant(coeff);
Ctxt origin(*unpacked[i]);
/// TODO use hoisting here
for (long j = 1; j < d; j++) {
tmp1 = origin;
tmp1.frobeniusAutomorph(j);
tmp1.cleanUp();
*(unpacked[i]) += tmp1;
}
}
A quick fix is to exchange the outer-side and inner-side loop.
This keeps the number of frobeniusAutomorph
to O(d); at the cost of increasing the number of
multByConstant
from O(d) to O(d^2) (also, with BSGS, it should be O(d sqrt(d))
// make sure the unpacked ciphertext is empty
for (long i = 0; i < unpacked.size(); i++)
unpacked[i]->clear();
DoubleCRT coeff(unpackSlotEncoding.front(), ctxt.getContext(), ctxt.getPrimeSet());
Ctxt tmp1(ZeroCtxtLike, ctxt);
Ctxt tmp2(ZeroCtxtLike, ctxt);
/// TODO Use hoisting for the outer-loop
for (long j = 0; j < d; j++) {
tmp1 = ctxt;
tmp1.frobeniusAutomorph(j);
tmp1.cleanUp();
for (long i = 0; i < unpacked.size(); i++) {
tmp2 = tmp1;
auto coeff_cpy{coeff};
coeff_cpy.automorph(NTL::PowerMod(t, mcMod(i + j, d), m));
tmp2.multByConstant(coeff_cpy);
*unpacked[i] += tmp2;
}
}
Hi @fionser, would you have time to rebase your fork and check the conflicts or whether your suggestions still apply to the newer version of HElib, please?
Thank you