llama.cpp
llama.cpp copied to clipboard
Change ./main help output to better reflect context size's affect on generation length
Discussed in https://github.com/ggerganov/llama.cpp/discussions/446
Originally posted by cmp-nct March 24, 2023 I've been testing alpaca 30B (-t 24 -n 2000 --temp 0.2 -b 32 --n_parts 1 --ignore-eos --instruct) I've consistently have it "stop" after 300-400 tokens output (30-40 tokens input) No error message, no crash and given the -n 2000 and the ignore-eos no reason to stop so early
I guess it would be useful if the program provides a verbose quit reason, though in my case I can't see any reason for it to stop before token max is reached.
I'm not sure if that's a bug to report or if I am missing something.
-c N, --ctx_size N size of the prompt context (default: 512)
You'll want to set -c 2048
(max recommended for LlaMa) Also set -n 2048
as well, effectively they are the same. ctx_size is the hard cap on the session, and n_predict is the max it can spit out without stopping. The default ctx_size is 512 which is right it stops after 400 tokens or so.
Thanks for the explanation, I think the help output needs a change. It describes "-c" as context for the prompt (didn't make much sense to me) not as context for the whole result!