Mukul Tripathi
Mukul Tripathi
@ubergarm Would you be able to post a guide on how to make the IQ4 version of the Qwen Model?
> [@mtcl](https://github.com/mtcl) > > What is the model you are running with KTransformers? > > On the "crash": the DeepSeek self attention mechanism is special (different from basically any other...
> That is with `ik_llama.cpp`. My question was what model are you running with KTransformers? oh sorry! I understand now, i am running a Q4_K_M-FP8 hybrid model, if you want...
Can you please help me in modifying this command to get more context length with 2X4090 setup. ```bash CUDA_VISIBLE_DEVICES="0, 1" ./build/bin/llama-server \ --model /media/mukul/backup/models/ubergarm/DeepSeek-R1-0528-GGUF/IQ3_K_R4/DeepSeek-R1-0528-IQ3_K_R4-00001-of-00007.gguf \ --alias ubergarm/DeepSeek-R1-0528-GGUF \ --ctx-size 32768...
ok I posted the whole video here, showing every command i ran with all the log outputs. https://www.youtube.com/watch?v=kDhu0siTvEg > I think if you are able to offload two layers of...
i tried modifying the command like this, but i get error: (base) mukul@jarvis:~/dev-ai/ik_llama.cpp$ ```bash CUDA_VISIBLE_DEVICES="0, 1" ./build/bin/llama-server \ --model /media/mukul/backup/models/ubergarm/DeepSeek-R1-0528-GGUF/IQ3_K_R4/DeepSeek-R1-0528-IQ3_K_R4-00001-of-00007.gguf \ --alias ubergarm/DeepSeek-R1-0528-GGUF \ --ctx-size 32768 \ -ctk q8_0 \...