llama.cpp
llama.cpp copied to clipboard
Bug: n_ctx will reuse n_ctx_train when --ctx_size not set and make deepseek-v2 models meet out of memory crash even on a small output length.
What happened?
deepseek-v2 model will meet out of memory issue with the kv buffer size allocating about 43G with a 160K context length from the model. But when you set the -c or --ctx_size 2048, then the inference can work normally.
Name and Version
./build/bin/llama-cli -m deepseek-v2-lite-chat-q4_0.gguf -p "how to build a website?" -n 32 -e -ngl 29 -sm none Linux build on master branch :c8a0090922bad576623de4aae227717085249262
What operating system are you seeing the problem on?
No response
Relevant log output
No response