ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

RTX50系显卡上如何部署?

Open xjn-La-La-land opened this issue 3 months ago • 7 comments

CPU:Intel Xeon w7-3565X (64) @ 4.80 GHz GPU:NVIDIA GeForce RTX 5090 32G 内存:DDR5-5600 ECC Registered 64G × 16 = 1024G 系统:Ubuntu 22.04.4 LTS x86_64 KTranformers 版本:v0.3.2 CUDA 版本:12.9 运行 DeepSeek-V2-Lite-Chat 模型:

python -m ktransformers.local_chat --model_path deepseek-ai/DeepSeek-V2-Lite-Chat --gguf_path ./DeepSeek-V2-Lite-Chat-GGUF

Hugging Face transformers 的不同版本似乎都无法支持 Kt 运行:

  • transformers = 4.43.2
ImportError: cannot import name 'FlashAttentionKwargs' from 'transformers.modeling_flash_attention_utils' (/home/xiejianan/miniconda3/envs/ktransformers/lib/python3.11/site-packages/transformers/modeling_flash_attention_utils.py)
  • transformers = 4.56.1
ValueError: You should provide exactly one of layers or layer_class_to_replicate to initialize a Cache.

求问在 RTX50 系显卡上(CUDA版本大于12.8)如何部署 Ktransformers?

xjn-La-La-land avatar Sep 18 '25 11:09 xjn-La-La-land

我也有相同的问题,能否给个5090的部署方案?

jyizheng avatar Sep 21 '25 17:09 jyizheng

相同的问题,transformers==4.56.2 报错

ValueError: You should provide exactly one of `layers` or `layer_class_to_replicate` to initialize a Cache.

Yuan-Allen avatar Sep 24 '25 07:09 Yuan-Allen

Successful RTX 5090 Installation Guide (Step 6) Hardware Specs:

🖥️ Ryzen 9 9950X3D

🧠 DDR5-5600 256GB

🎮 RTX 5090

🚀 Samsung 990PRO 2TB + 2TB

Final Environment Status:

text

🎯 Environment Check Successful

PyTorch: 2.8.0+cu128 Transformers: 4.56.1 KTransformers: 0.3.2 CUDA available: True GPU: NVIDIA GeForce RTX 5090 Step 6: KTransformers Installation (Critical Steps) bash

Clone repository

cd ~ git clone https://github.com/KVCache-AI/KTransformers.git cd KTransformers

Submodule initialization (IMPORTANT!)

git submodule init git submodule update

Installation

make dev_install

Verification

python -c "import ktransformers; print('Installation successful!')" Key Success Factors:

✅ Submodule initialization is mandatory

✅ PyTorch 2.8+ with CUDA 12.8 compatibility

✅ make dev_install for complete setup

✅ RTX 5090 works perfectly with KTransformers

Hope this helps others struggling with RTX 50 series setup! 🚀

RFharaga avatar Sep 24 '25 19:09 RFharaga

Can you run Qwen3-235B model?

jyizheng avatar Sep 30 '25 18:09 jyizheng

5080也是 ValueError: You should provide exactly one of layers or layer_class_to_replicate to initialize a Cache. 这个问题,transformers=4.57.0,ktransformers=0.3.2+cu128torch28avx2,cuda12.8,torch2.8.0 尝试了4060ti也是这个问题,有人知道如何解决吗

zhizi42 avatar Oct 14 '25 04:10 zhizi42

研究了几天发现其实很简单,在开始git clone的时候改为clone指定版本就行了 git clone --branch v0.3.2 https://github.com/kvcache-ai/ktransformers.git

zhizi42 avatar Oct 18 '25 05:10 zhizi42

研究了几天发现其实很简单,在开始git clone的时候改为clone指定版本就行了 git clone --branch v0.3.2 https://github.com/kvcache-ai/ktransformers.git

请问能通过它项目中的dockerfile来构建可用的docker吗?那个文件中的哪些位置需要修改来适配5090?

cqllzp avatar Oct 26 '25 01:10 cqllzp