lance
lance copied to clipboard
Exclude sign bit from bitpacked encoding if all values are negative
In https://github.com/lancedb/lance/pull/2662 we added support for bitpacking signed integers in LanceV2. In https://github.com/lancedb/lance/pull/2696, an optimization was made to exclude the sign bit if all the values for a signed type are positive.
We can make a further optimization to exclude the sign bit if all the values are negative.
The way to do this could be to:
- change the Bitpacked encoding proto message to have a flag indicating all the values are negative https://github.com/lancedb/lance/blob/35e38624b91a17c5202f38bb587d8b432914dd58/protos/encodings.proto#L176
- In
bitpack_params_for_signed_type
add logic to determine if all values are negative, and if so, don't add the sign bit to the number of bits. We can also modify the return typeBitpackParams
as suggested here: https://github.com/lancedb/lance/blob/b9990d935b0c68ca7149ed8efe45d2f4ee3d9249/rust/lance-encoding/src/encodings/physical/bitpack.rs#L79 - In the decoder, if all the bits are negative then determine this from the field on the encoding proto-message instead of checking the MSB of the encoded value like we do here: https://github.com/lancedb/lance/blob/b9990d935b0c68ca7149ed8efe45d2f4ee3d9249/rust/lance-encoding/src/encodings/physical/bitpack.rs#L440-L445