kirnat
kirnat
You can try [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
True, ik doesn't have any specific AMX optimization, but going by the benchmark results from this repo (twhich may be outdated) I don't really see it outperforming ik in neither...
While I’d be excited to see AMX support, I can’t say the kTransformers Qwen3 benchmark proves its usefulness. I can’t verify the pp/tg window sizes or the exact model they...
### Confirming AMX buffer **llama.cpp/build/bin/llama-cli -m ./models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf** load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: CPU_Mapped model buffer size = 4685.30 MiB load_tensors: AMX model...
I hadn't even considered it for CPU only inference. I have used it alot day to day for hybrid inference with great results. Same settings as above and GPU hidden...