Results 8 comments of aubreyli

AMX only takes effect during the prefill phase if the batch size is large enough, it does not participate in the decode phase in your case. Also, BF16 provides higher...

> 我至强6代+2 intel Arc A770 跑满血版Q4量化 速度是1.9~2.9 t/s 线程44 ,不知道大家是不是跟我一样 What's your OS distro and version? and your memory size and speed? Different software component versions could cause different performance....

Please refer to issue #1329 , according to your hardware configuration, your decode speed should be around 5 to 6 tokens per second

You need to download BF16 GGUF, See [AMX doc](https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/AMX.md)

> > 您需要下载 BF16 GGUF,请参阅 [AMX 文档](https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/AMX.md) > > 我确认一下哈,我现在用的时q4_K_M的gguf ,我需要重新下载BF16版本的GGUF并替换它们是吗 Yes

> 一共需要1.3T的那个版本吗 If you have to use DeepSeek 671B, then yes. Otherwise, you might want to use a smaller model like the Qwen3-30B BF16 GGUF to try AMX.

> is there a timeline for the AMXInt4 backend? Currently, AMX hardware mainly supports BF16 and INT8 formats. If you have low-precision weights (such as 4-bit), they must first be...

safetensor BF16 should work. The following webpage for your reference. https://www.intel.com/content/www/us/en/developer/articles/code-sample/advanced-matrix-extensions-intrinsics-functions.html