intgemm icon indicating copy to clipboard operation
intgemm copied to clipboard

int8_t and int16_t matrix multiply based on https://arxiv.org/abs/1705.01991

Results 15 intgemm issues
Sort by recently updated
recently updated
newest added

I am trying to get serenade.ai to execute natively on an m1 mac (not rosetta). This is one of a very small number of dependencies that cannot be compiled at...

This PR does two things: ~1) Changes the standard to C++17. Marian already uses that, there's no reason why we should continue with 11. gcc 5 supports almost the full...

Attempt the wormhole instruction and check results. Use this like CPUID to dispatch wormhole and non-wormhole versions.

It's not a _purr_ fect implementation, but it is a start... This patch implements the following: - PrepareB for arbitrary columns matrices for all architectures. The last non-multiple-of-eight-columns are prepared...

While compiling intgemm with one of the latest versions of the ICC (icpc (ICC) 19.1.3.304) I got the following result: ``` benchmarks/../intgemm/callbacks/implementations.inl(47): error #3632: "target" attribute on special function is...

Marian seems to be moving to using CMake install targets https://github.com/marian-nmt/marian-dev/issues/862 and intgemm doesn't work as an install target. It won't work, because after we add this to the cmake...

Allow parts of matrices to have different quantization multipliers: https://github.com/marian-nmt/marian-dev/blob/master/src/tensors/cpu/fbgemm/packed_gemm.cpp#L368

`srai_epi16` expects an immediate shift value, not a variable. Here's it is called with a variable: https://github.com/kpu/intgemm/blob/61bcbae423eab96156f646a92107ca5300b8ae27/kernels/implementations.inl#L308-L309 And the caller is very much using a variable: https://github.com/kpu/intgemm/blob/61bcbae423eab96156f646a92107ca5300b8ae27/test/kernels/multiply_sat_test.cc#L24-L25 I don't know...

On ssse3 (tested on the mac) ``` Arch: any Matrix size: M: 1024 K: 1024 N: 1024 in loop, for 1000 interations: dnnl s8s8s32 gemm took: 160.7630360000 seconds. dnnl u8s8s32...

The following tests FAILED: 1 - PrepareBias SSSE3 (Failed) 2 - PrepareBias AVX2 (Failed) 15 - Multiply SSSE3 8bit Shift vs Int (Failed) 16 - Multiply AVX2 8bit Shift vs...