intgemm
intgemm copied to clipboard
PrepareB but take integers instead of float
The current PrepareB function combines quantization and rearrangement. The rearragement is dependent on register length. We're going to want to distribute int8 models in an architecture-independent fashion (probably as row major) then have them rearranged at load. The Quantize function already converts to int8 format without rearranging. So what's needed is an int8 rearrangement function.
Possibly with a preprocessing template, though that sounds complicated.
Also worth considering if this should be done in-place or copying.
Prepare B if B is quantized and transposed: https://github.com/kpu/intgemm/tree/prepare-b-quantized-transposed
Prepare B if B is transposed https://github.com/kpu/intgemm/tree/prepare-b-transposed
I think we can merge them to the master first and then try to do some optimizations.
Ooh
Merged prepare-b-quantized-transposed in 03a4a9dbe4e1955efdb6c6f671636d9378755f45
We need prepareB if B is only quantized too.
Also, a slight enhancement, it would be nice (and probably more important from performance point of view) to have transpose and Quantize for prepareA. The affine and dot operators take transA
and transB
as a parameter. B is cached, so it's not a big deal, but A is not, which means that there would be two memory accesses to A. If we have quantizeAndTranspose that would solve it.
So we need all combinations?:
- PrepareB if B is quantized and transposed
- PrepareB if B is only transposed
- PrepareB if B is only quantized