Kenneth Heafield

Results 290 comments of Kenneth Heafield

I've cleaned it up! 16-bit and 8-bit on SSSE3, AVX2, and AVX512BW. https://github.com/kpu/intgemm

@emjotde I'm trying to merge but running into a B matrix shaped 512x735. The number of columns is not a multiple of 8. It's coming from shortlisting: https://github.com/marian-nmt/marian-dev/blob/master/src/layers/generic.h#L153 I also...

I've pushed most of an integration to 85ad45efad278e4337c4919fe1a7cf0544b678a3. Quantization has been split into PrepareA and PrepareB (which includes the "transpose" but actually turns it into something complicated). Can you make...

Just disabling shortlisting for now in my tests...

36d188e6f0dee36e35553f2f1dcf51d32dcecb5e works when shortlisting is disabled with int16. WNMT run.cpu.sh on dagr which has AVX2, with shortlists disabled: master 25.59 intgemm 25.64 Insignificant BLEU improvement is good (there are some...

Dynamic quantization works on an unclipped model, namely /fs/hoenir0/heafield/wnmt/cpu/wnmt/model/model.npz which @emjotde had placed on the CPU Amazon machine under ~/wnmt/model/model.npz . 8-bit, no shortlists: dynamic quantization to 127.0f / max(|value|):...

It compiles a fat binary with support for multiple vector lengths. At load time it initializes functions based on CPUID. In other words, it works on CPUs all the way...

Looks like the assembler doesn't want to support AVX512BW and AVX512DQ instructions even though the compiler has the intrinsics for them. Valhalla is running Ubuntu 16.04 and it compiles fine...

Can you send me `as --version`? I think your binutils is ancient.

I've added a cmake compilation test to guard against older compilers and assemblers, omitting avx512 in such cases. gcc 5.4.1 and gcc 7.2.0 are fine, but your binutils is too...