aubreyli comments

Results 8 comments of


                                            aubreyli

[Bug] Why AMX speed down?

AMX only takes effect during the prefill phase if the batch size is large enough, it does not participate in the decode phase in your case. Also, BF16 provides higher...

> 我至强6代+2 intel Arc A770 跑满血版Q4量化速度是1.9~2.9 t/s 线程44 ，不知道大家是不是跟我一样 What's your OS distro and version? and your memory size and speed? Different software component versions could cause different performance....

intel gpu推理速度

Please refer to issue #1329 , according to your hardware configuration, your decode speed should be around 5 to 6 tokens per second

启动AMX功能报错

You need to download BF16 GGUF, See [AMX doc](https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/AMX.md)

启动AMX功能报错

> > 您需要下载 BF16 GGUF，请参阅 [AMX 文档](https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/AMX.md) > > 我确认一下哈，我现在用的时q4_K_M的gguf ，我需要重新下载BF16版本的GGUF并替换它们是吗 Yes

启动AMX功能报错

> 一共需要1.3T的那个版本吗 If you have to use DeepSeek 671B, then yes. Otherwise, you might want to use a smaller model like the Qwen3-30B BF16 GGUF to try AMX.

启动AMX功能报错

> is there a timeline for the AMXInt4 backend? Currently, AMX hardware mainly supports BF16 and INT8 formats. If you have low-precision weights (such as 4-bit), they must first be...

启动AMX功能报错

safetensor BF16 should work. The following webpage for your reference. https://www.intel.com/content/www/us/en/developer/articles/code-sample/advanced-matrix-extensions-intrinsics-functions.html