llama.cpp Change ./main help output to better reflect context size's affect on generation length

Change ./main help output to better reflect context size's affect on generation length

Open cmp-nct opened this issue 1 year ago • 2 comments

Discussed in https://github.com/ggerganov/llama.cpp/discussions/446

^{Originally posted by cmp-nct March 24, 2023} I've been testing alpaca 30B (-t 24 -n 2000 --temp 0.2 -b 32 --n_parts 1 --ignore-eos --instruct) I've consistently have it "stop" after 300-400 tokens output (30-40 tokens input) No error message, no crash and given the -n 2000 and the ignore-eos no reason to stop so early

I guess it would be useful if the program provides a verbose quit reason, though in my case I can't see any reason for it to stop before token max is reached.

I'm not sure if that's a bug to report or if I am missing something.

Mar 24 '23 01:03 cmp-nct

-c N, --ctx_size N size of the prompt context (default: 512) You'll want to set -c 2048 (max recommended for LlaMa) Also set -n 2048 as well, effectively they are the same. ctx_size is the hard cap on the session, and n_predict is the max it can spit out without stopping. The default ctx_size is 512 which is right it stops after 400 tokens or so.

Mar 24 '23 02:03 rabidcopy

Thanks for the explanation, I think the help output needs a change. It describes "-c" as context for the prompt (didn't make much sense to me) not as context for the whole result!

Mar 24 '23 02:03 cmp-nct

llama.cpp llama.cpp copied to clipboard

Change ./main help output to better reflect context size's affect on generation length

Discussed in https://github.com/ggerganov/llama.cpp/discussions/446

llama.cpp
llama.cpp copied to clipboard