fastllm intel 2x6138 gpu+cpu混合部署int8 类型报错，单独cpu或gpu用int8都没错

(base) fox@fox-SA5212M5:~/Desktop$ ftllm webui /media/fox/NGFF-476GB/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --device multicuda:0,1,cpu --dtype int8 -t 30 Running: streamlit run --server.port 1616 /home/fox/miniconda3/lib/python3.12/site-packages/ftllm/web_demo.py -- /media/fox/NGFF-476GB/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --threads 30 --dtype int8 --atype auto --kv_cache_limit auto --max_batch -1 --device multicuda:0,1,cpu --moe_experts -1

You can now view your Streamlit app in your browser.

Local URL: http://localhost:1616 Network URL: http://192.168.2.43:1616

Load libnuma.so.1 CUDA_ARCH: 890 USE_TENSOR_CORE: 1 Load libfastllm_tools.so None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Convert 100 Warmup... Error: CpuRunMatmul unsupport type: 3.

Apr 15 '25 15:04 foxjoe000

目前还没有支持int8的gpu+cpu混合运算，其它精度应该是可以的

Apr 15 '25 15:04 ztxz16

实测int4和float16是可以的

Apr 17 '25 14:04 foxjoe000