Slight optimization to choice of "anyRangePoint" in tensor.extract kernel
In https://github.com/google/heir/pull/2205, the tensor.extract kernel uses anyRangePoint to decide what to mask from the ciphertext, and then uses convert_layout to ensure the resulting scalar is packed according to the result layout chosen by layout-optimization.
Suppose we have a case where the ciphertext contains the value to extract in two separate slots, and one aligns exactly with the desired output layout:
ct = [x, x, 7, x, x, 7, x]
desired_result = [0, 0, 7, 0, 0, 0, 0]
In this case the anyRangePoint may end up masking for the second 7, which then incurs a single extra rotation to align it to the right final position. Instead, we could iterate over the desired result layout's range points, find which slots are already set in corresponding range points of the ciphertext tensor, and then mask all of the matching slots. This will at best remove all rotations, and at worst get the maximum number of matching slots pre-populated, which should make the resulting convert_layout cheaper.
This issue has 1 outstanding TODOs:
- lib/Transforms/ConvertToCiphertextSemantics/ConvertToCiphertextSemantics.cpp:894: use a smarter mask via the desired scalar result layout
This comment was autogenerated by todo-backlinks