moose
moose copied to clipboard
Improve practical perf of bit decompose?
The current approach creates a single bit tensor on host placements by stacking together several bit tensors. But we often call index
on the stacked tensor afterwards to separate them again. Would it be better to use bit arrays all the way instead which are not backed by a single bit tensor but rather by eg a vector of bit tensors?
While I agree that it would be nice to avoid indexing all the time I think sometimes it's useful to have a single bit tensor to work with to vectorize some instructions.
For eg, we have bit-decomposition which benefits of this vectorization: https://github.com/tf-encrypted/runtime/blob/main/moose/src/replicated/mod.rs#L2746 and the binary adder: https://github.com/tf-encrypted/runtime/blob/main/moose/src/replicated/mod.rs#L2546
So not sure, perhaps having a way to stack tensors into a single tensor would be something useful while changing the output of bit-decomposition to a vector of tensors.
@rdragos @voronaam @jvmncs maybe something to discuss next week