ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

[Feature] Is there any chance to support unsloth's IQ1, IQ2 quantized DeepSeek 0528 and V3.1 models ?

Open wqshmzh opened this issue 3 months ago • 1 comments

Checklist

  • [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed.
  • [x] 2. To help the community, I will use Chinese/English or attach an Chinese/English translation if using another language. Non-English/Chinese content without translation may be closed.

Motivation

I have tried running unsloth's DeepSeek-R1-0528-IQ2_XXS and DeepSeek-V3.1-IQ2_M using KT. Both models are not fully supported by KT yet where errors will be raised if one run them using KT. I only have a desktop computer with 256GB DDR5 memory. Thank you very much if you could reach me soon.

Related resources

No response

wqshmzh avatar Sep 22 '25 12:09 wqshmzh

@wqshmzh you do understand that ktransformers under the hood uses some old version of flashinter to deal with attention? Have you even tried to run say, unsloth's with ktransformers with long context? You will get garbage output randomly, from time to time with no chance of debugging it. I looked into the RSS feed of flashinfer for a while and I am certain they don't know what they are doing. So the question is -- what ktransformers are trying to achieve here? What you trying to achive? Got 256GB RAM? Pick a suitable quant from @Thireus and stick to ik_llama.cpp. There is no point of using ktransformers.

magikRUKKOLA avatar Sep 23 '25 08:09 magikRUKKOLA