Ma Mingfei

Results 93 comments of Ma Mingfei

> Here is my suggestion: > > 1. update the README.md > Explain the condition that AMX will be used to speed up inference, like hardware, build parameter. > How...

**Updates**: f16 support added Right now this patch only has a avx512 kernel which is doing fma with `avx512f` (did not use `avx512-fp16` here as `_mm512_fmadd_ph` is doing accumulation with...

> I think it would be better to leave the implementation as is instead of moving it to a different backend, the performance would be slightly better, and I don't...

> Is there any progress? I am really looking forward to the AMX support. Recently I got distracted by some other tasks, I use my spare time to work on...

Added AMX and VNNI kernels for `Q4_K`, `Q5_K`, `Q6_K`, `IQ4_XS`.

> I like that the code is very well isolated from the rest of the codebase. Haven't reviewed the `mmq.cpp` source in details yet and it would be difficult without...

> Hi, I noticed some quantization issues in `mmq.cpp`. https://github.com/mingfeima/llama.cpp/blob/74bb1eb52be7d9b9eb484d156d24a474dd09f278/ggml/src/ggml-amx/mmq.cpp#L1183-L1195 Here, we are using a single scale `vd0` for all 16x32 weights. However, Q8_0 uses scale parameter per `blck_size=32` elements....

@ggerganov On Azure, `DCesv5` and `ECesv5` instances have intel AMX support, they are all 4th gen Xeon (codename Sapphire Rapids): https://azure.microsoft.com/en-us/updates/confidential-vms-with-intel-tdx-dcesv5-ecesv5/ https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dcesv5-series?tabs=sizebasic Is that possible to use those instances for...

CC @malfet @jgong5 @xujuntwt95329 do we have a test plan for this one ?