llama.cpp Maybe lower default temp and switch to top

Maybe lower default temp and switch to top_k 40

Open bakkot opened this issue 1 year ago • 4 comments

Per this twitter thread. See commit here.

Mar 12 '23 10:03 bakkot

--top_k N top-k sampling (default: 40)

Mar 12 '23 12:03 G2G2G2G

AFAIK, there is no top k filtering in the current version. The main code uses the llama_sample_top_p, and not gpt_sample_top_k_top_p which is the only piece of code that actually uses the top_k parameter.

The repetition penalty could maybe be ported to this sampler and used instead?

I've seen multiple people reporting that FB's default sampler is not adequate for comparing LLaMA's outputs with davinci's. Thus, enabling top k filtering could allow people to experiment with and compare different sampling strategies.

Mar 12 '23 13:03 Piezoid

It does seem to work much better with these options, based on shawwn's patch: --temp 0.7 --top_k 40 --top_p 0 --repeat_last_n 256 --repeat_penalty 1.1764705882352942

I'm not sure what value is ideal for repeat_last_n, but with a little testing, 256 seems to be enough, while 128 wasn't.

Mar 12 '23 16:03 sswam

--temp 0.7 --top_k 40 --top_p 0 --repeat_last_n 256 --repeat_penalty 1.1764705882352942

llama.cpp with --top_p 0 is greedy inference, picking the highest probability token.

Mar 12 '23 17:03 Piezoid

FYI Top K isn't used this PR tho should fix it: https://github.com/ggerganov/llama.cpp/pull/56

Mar 12 '23 20:03 beiller

Need a better strategy to determine default parameters. Single examples do not show anything of value

Mar 13 '23 17:03 ggerganov

llama.cpp llama.cpp copied to clipboard

Maybe lower default temp and switch to top_k 40

llama.cpp
llama.cpp copied to clipboard