启动AMX功能报错
在启动local_chat 命令 --optimize_config_path ./....../optimize_rules/DeepSeek-V3-Chat-amx.yaml时报错,在开始部署专家层时显示精度转换问题,请问该如何解决
You need to download BF16 GGUF, See AMX doc
一共需要1.3T的那个版本吗
一共需要1.3T的那个版本吗
If you have to use DeepSeek 671B, then yes. Otherwise, you might want to use a smaller model like the Qwen3-30B BF16 GGUF to try AMX.
is there a timeline for the AMXInt4 backend?
is there a timeline for the AMXInt4 backend?
Currently, AMX hardware mainly supports BF16 and INT8 formats. If you have low-precision weights (such as 4-bit), they must first be dequantized into either BF16 or INT8 before AMX can be used for computation.
I understand. With only 512GB of ram, is there a future where R1/V3 can be used with AMX optimizations?
I only ask because in the AMX docs, the performance results table seems to hint at QWEN3-235B being loaded at 4-bit and consuming 160 GB of system ram. Rough calculations of R1 loaded this way would put it at ~460GB. Not everything is equal obviously..
@aubreyli 请问是不是因为gguf的bf16版本和硬件中的bf16版本数据格式布局不同?否则直接使用huggingface上safetensor的bf16版本不行吗
safetensor BF16 should work. The following webpage for your reference. https://www.intel.com/content/www/us/en/developer/articles/code-sample/advanced-matrix-extensions-intrinsics-functions.html