When Running deepseek-r1-dynamic-1.58-bit，the KV cache question

Open junyan-zg opened this issue 10 months ago • 3 comments

I had 8 * 4090，I want to put the 7 cards on gpu_layers，the KV cache for the free card，how to set？Now the KV on the cpu load is so slowly.

Feb 08 '25 11:02 junyan-zg

I'm no expert, but I think adjusting the tensor_split setting should fix it. It seems like you should be able to compress tensors across 7 cards and push some tensors and kv cache onto 1 card.

Feb 11 '25 15:02 akaikite

Similar question. Is there any guide on this?

Feb 12 '25 03:02 nullnuller

Can the latest llama.cpp now support running the deepseek-r1-dynamic-1.58-bit model, assuming sufficient hardware memory?

Feb 13 '25 15:02 leeetao

This issue was closed because it has been inactive for 14 days since being marked as stale.

Mar 30 '25 01:03 github-actions[bot]