ktransformers 启动AMX功能报错

在启动local_chat 命令 --optimize_config_path ./....../optimize_rules/DeepSeek-V3-Chat-amx.yaml时报错，在开始部署专家层时显示精度转换问题，请问该如何解决

Jun 27 '25 06:06 kmnns

You need to download BF16 GGUF, See AMX doc

Jun 27 '25 06:06 aubreyli

您需要下载 BF16 GGUF，请参阅 AMX 文档

我确认一下哈，我现在用的时q4_K_M的gguf ，我需要重新下载BF16版本的GGUF并替换它们是吗

Jun 27 '25 06:06 kmnns

一共需要1.3T的那个版本吗

Jun 27 '25 06:06 kmnns

您需要下载 BF16 GGUF，请参阅 AMX 文档

我确认一下哈，我现在用的时q4_K_M的gguf ，我需要重新下载BF16版本的GGUF并替换它们是吗

Yes

Jun 27 '25 08:06 aubreyli

一共需要1.3T的那个版本吗

If you have to use DeepSeek 671B, then yes. Otherwise, you might want to use a smaller model like the Qwen3-30B BF16 GGUF to try AMX.

Jun 27 '25 08:06 aubreyli

is there a timeline for the AMXInt4 backend?

Jun 27 '25 19:06 trilog-inc

is there a timeline for the AMXInt4 backend?

Currently, AMX hardware mainly supports BF16 and INT8 formats. If you have low-precision weights (such as 4-bit), they must first be dequantized into either BF16 or INT8 before AMX can be used for computation.

Jun 30 '25 02:06 aubreyli

I understand. With only 512GB of ram, is there a future where R1/V3 can be used with AMX optimizations?

I only ask because in the AMX docs, the performance results table seems to hint at QWEN3-235B being loaded at 4-bit and consuming 160 GB of system ram. Rough calculations of R1 loaded this way would put it at ~460GB. Not everything is equal obviously..

Jun 30 '25 14:06 trilog-inc

@aubreyli 请问是不是因为gguf的bf16版本和硬件中的bf16版本数据格式布局不同？否则直接使用huggingface上safetensor的bf16版本不行吗

Dec 03 '25 07:12 always-H

safetensor BF16 should work. The following webpage for your reference. https://www.intel.com/content/www/us/en/developer/articles/code-sample/advanced-matrix-extensions-intrinsics-functions.html

Dec 04 '25 03:12 aubreyli