candle [AArch64] Quantized MatMul performance improvement

[AArch64] Quantized MatMul performance improvement

Open bgergely0 opened this issue 1 year ago • 0 comments

This change implements an new matmul based on Armv8.6 i8mm instructions. As the change requires a nightly compiler, using the more performant version is optional, under the new arm-nightly-feat feature flag.

I have also added the Armv8.4 dotprod instructions under this flag.

Performance improvement:

+10% with only dotprod enabled (measured on LLaMa2)
+40-60% for i8mm based matmul (measured on quantized Whisper, different cores produced different results)

Oct 03 '24 08:10 bgergely0

candle candle copied to clipboard

[AArch64] Quantized MatMul performance improvement

candle
candle copied to clipboard