llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

When Running deepseek-r1-dynamic-1.58-bit,the KV cache question

Open junyan-zg opened this issue 10 months ago • 3 comments

I had 8 * 4090,I want to put the 7 cards on gpu_layers,the KV cache for the free card,how to set?Now the KV on the cpu load is so slowly.

junyan-zg avatar Feb 08 '25 11:02 junyan-zg

I'm no expert, but I think adjusting the tensor_split setting should fix it. It seems like you should be able to compress tensors across 7 cards and push some tensors and kv cache onto 1 card.

akaikite avatar Feb 11 '25 15:02 akaikite

Similar question. Is there any guide on this?

nullnuller avatar Feb 12 '25 03:02 nullnuller

Can the latest llama.cpp now support running the deepseek-r1-dynamic-1.58-bit model, assuming sufficient hardware memory?

leeetao avatar Feb 13 '25 15:02 leeetao

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Mar 30 '25 01:03 github-actions[bot]