candle icon indicating copy to clipboard operation
candle copied to clipboard

[AArch64] Quantized MatMul performance improvement

Open bgergely0 opened this issue 1 year ago • 0 comments

This change implements an new matmul based on Armv8.6 i8mm instructions. As the change requires a nightly compiler, using the more performant version is optional, under the new arm-nightly-feat feature flag.

I have also added the Armv8.4 dotprod instructions under this flag.

Performance improvement:

  • +10% with only dotprod enabled (measured on LLaMa2)
  • +40-60% for i8mm based matmul (measured on quantized Whisper, different cores produced different results)

bgergely0 avatar Oct 03 '24 08:10 bgergely0