intgemm icon indicating copy to clipboard operation
intgemm copied to clipboard

PrepareB but take integers instead of float

Open kpu opened this issue 5 years ago • 6 comments

The current PrepareB function combines quantization and rearrangement. The rearragement is dependent on register length. We're going to want to distribute int8 models in an architecture-independent fashion (probably as row major) then have them rearranged at load. The Quantize function already converts to int8 format without rearranging. So what's needed is an int8 rearrangement function.

Possibly with a preprocessing template, though that sounds complicated.

Also worth considering if this should be done in-place or copying.

kpu avatar Nov 28 '19 19:11 kpu

Prepare B if B is quantized and transposed: https://github.com/kpu/intgemm/tree/prepare-b-quantized-transposed

Prepare B if B is transposed https://github.com/kpu/intgemm/tree/prepare-b-transposed

I think we can merge them to the master first and then try to do some optimizations.

mateuszchudyk avatar Jan 20 '20 18:01 mateuszchudyk

Ooh

kpu avatar Jan 20 '20 18:01 kpu

Merged prepare-b-quantized-transposed in 03a4a9dbe4e1955efdb6c6f671636d9378755f45

kpu avatar Jan 21 '20 11:01 kpu

We need prepareB if B is only quantized too.

XapaJIaMnu avatar Jan 30 '20 16:01 XapaJIaMnu

Also, a slight enhancement, it would be nice (and probably more important from performance point of view) to have transpose and Quantize for prepareA. The affine and dot operators take transA and transB as a parameter. B is cached, so it's not a big deal, but A is not, which means that there would be two memory accesses to A. If we have quantizeAndTranspose that would solve it.

XapaJIaMnu avatar Jan 30 '20 17:01 XapaJIaMnu

So we need all combinations?:

  • PrepareB if B is quantized and transposed
  • PrepareB if B is only transposed
  • PrepareB if B is only quantized

mateuszchudyk avatar Jan 30 '20 22:01 mateuszchudyk