intgemm PrepareB but take integers instead of float

PrepareB but take integers instead of float

Open kpu opened this issue 5 years ago • 6 comments

The current PrepareB function combines quantization and rearrangement. The rearragement is dependent on register length. We're going to want to distribute int8 models in an architecture-independent fashion (probably as row major) then have them rearranged at load. The Quantize function already converts to int8 format without rearranging. So what's needed is an int8 rearrangement function.

Possibly with a preprocessing template, though that sounds complicated.

Also worth considering if this should be done in-place or copying.

Nov 28 '19 19:11 kpu

Prepare B if B is quantized and transposed: https://github.com/kpu/intgemm/tree/prepare-b-quantized-transposed

Prepare B if B is transposed https://github.com/kpu/intgemm/tree/prepare-b-transposed

I think we can merge them to the master first and then try to do some optimizations.

Jan 20 '20 18:01 mateuszchudyk

Ooh

Jan 20 '20 18:01 kpu

Merged prepare-b-quantized-transposed in 03a4a9dbe4e1955efdb6c6f671636d9378755f45

Jan 21 '20 11:01 kpu

We need prepareB if B is only quantized too.

Jan 30 '20 16:01 XapaJIaMnu

Also, a slight enhancement, it would be nice (and probably more important from performance point of view) to have transpose and Quantize for prepareA. The affine and dot operators take transA and transB as a parameter. B is cached, so it's not a big deal, but A is not, which means that there would be two memory accesses to A. If we have quantizeAndTranspose that would solve it.

Jan 30 '20 17:01 XapaJIaMnu

So we need all combinations?:

PrepareB if B is quantized and transposed
PrepareB if B is only transposed
PrepareB if B is only quantized

Jan 30 '20 22:01 mateuszchudyk

intgemm intgemm copied to clipboard

PrepareB but take integers instead of float

intgemm
intgemm copied to clipboard