llama.cpp Bug: n_ctx will reuse n_ctx_train when --ctx_size not set and make deepseek-v2 models meet out of memory crash even on a small output length.

Bug: n_ctx will reuse n_ctx_train when --ctx_size not set and make deepseek-v2 models meet out of memory crash even on a small output length.

Open ClarkChin08 opened this issue 6 months ago • 12 comments

What happened?

deepseek-v2 model will meet out of memory issue with the kv buffer size allocating about 43G with a 160K context length from the model. But when you set the -c or --ctx_size 2048, then the inference can work normally.

Name and Version

./build/bin/llama-cli -m deepseek-v2-lite-chat-q4_0.gguf -p "how to build a website?" -n 32 -e -ngl 29 -sm none Linux build on master branch :c8a0090922bad576623de4aae227717085249262

What operating system are you seeing the problem on?

No response

Relevant log output

No response

Aug 02 '24 05:08 ClarkChin08

llama.cpp llama.cpp copied to clipboard

Bug: n_ctx will reuse n_ctx_train when --ctx_size not set and make deepseek-v2 models meet out of memory crash even on a small output length.

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

llama.cpp
llama.cpp copied to clipboard