intgemm
intgemm copied to clipboard
int8_t and int16_t matrix multiply based on https://arxiv.org/abs/1705.01991
I am trying to get serenade.ai to execute natively on an m1 mac (not rosetta). This is one of a very small number of dependencies that cannot be compiled at...
This PR does two things: ~1) Changes the standard to C++17. Marian already uses that, there's no reason why we should continue with 11. gcc 5 supports almost the full...
Attempt the wormhole instruction and check results. Use this like CPUID to dispatch wormhole and non-wormhole versions.
It's not a _purr_ fect implementation, but it is a start... This patch implements the following: - PrepareB for arbitrary columns matrices for all architectures. The last non-multiple-of-eight-columns are prepared...
While compiling intgemm with one of the latest versions of the ICC (icpc (ICC) 19.1.3.304) I got the following result: ``` benchmarks/../intgemm/callbacks/implementations.inl(47): error #3632: "target" attribute on special function is...
Marian seems to be moving to using CMake install targets https://github.com/marian-nmt/marian-dev/issues/862 and intgemm doesn't work as an install target. It won't work, because after we add this to the cmake...
Allow parts of matrices to have different quantization multipliers: https://github.com/marian-nmt/marian-dev/blob/master/src/tensors/cpu/fbgemm/packed_gemm.cpp#L368
`srai_epi16` expects an immediate shift value, not a variable. Here's it is called with a variable: https://github.com/kpu/intgemm/blob/61bcbae423eab96156f646a92107ca5300b8ae27/kernels/implementations.inl#L308-L309 And the caller is very much using a variable: https://github.com/kpu/intgemm/blob/61bcbae423eab96156f646a92107ca5300b8ae27/test/kernels/multiply_sat_test.cc#L24-L25 I don't know...
On ssse3 (tested on the mac) ``` Arch: any Matrix size: M: 1024 K: 1024 N: 1024 in loop, for 1000 interations: dnnl s8s8s32 gemm took: 160.7630360000 seconds. dnnl u8s8s32...
The following tests FAILED: 1 - PrepareBias SSSE3 (Failed) 2 - PrepareBias AVX2 (Failed) 15 - Multiply SSSE3 8bit Shift vs Int (Failed) 16 - Multiply AVX2 8bit Shift vs...