KIVI
KIVI copied to clipboard
How to understand the code: triton_quantize_and_pack_along_last_dim(value_states_full[:, :, :1, :].contiguous(), self.group_size, self.v_bits)
I don't understand why the input data is value_states_full[:, :, :1, :].contiguous() instead of value_states_full[:, :, :-1, :].transpose(2, 3).contiguous()