intgemm
intgemm copied to clipboard
int8_t and int16_t matrix multiply based on https://arxiv.org/abs/1705.01991
The current PrepareB function combines quantization and rearrangement. The rearragement is dependent on register length. We're going to want to distribute int8 models in an architecture-independent fashion (probably as row...
If there's only one choice of Callback, don't make it a template argument. Why would anybody want to provide something other than 1?
We have two means of creating functions for different architectures: macros (as currently used for multiplication) and repeated inclusion (as currently used for kernels). They're both sort of hacky, but...
The top-level API needs to be documented. These functions are just there with no explanation. https://github.com/kpu/intgemm/blob/5979a4bd31f3479a9f627051ae9fd1a36292bb27/intgemm.h#L256 https://github.com/kpu/intgemm/blob/5979a4bd31f3479a9f627051ae9fd1a36292bb27/intgemm.h#L261 https://github.com/kpu/intgemm/blob/5979a4bd31f3479a9f627051ae9fd1a36292bb27/intgemm.h#L237 https://github.com/kpu/intgemm/blob/5979a4bd31f3479a9f627051ae9fd1a36292bb27/intgemm.h#L229
``` In file included from /home/heafield/intgemm/./sse2_gemm.h:6:0, from /home/heafield/intgemm/./intgemm.h:48, from /home/heafield/intgemm/test/multiply_test.cc:4: /home/heafield/intgemm/./multiply.h: In function ‘static void intgemm::AVX512_16bit::Multiply(const int16_t*, const int16_t*, float*, PostprocessPipeline, intgemm::Index, intgemm::Index, intgemm::Index) [with PostprocessPipeline = std::tuple]’: /home/heafield/intgemm/./multiply.h:16:50: internal...