llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Maybe lower default temp and switch to top_k 40

Open bakkot opened this issue 1 year ago • 4 comments

Per this twitter thread. See commit here.

bakkot avatar Mar 12 '23 10:03 bakkot

--top_k N top-k sampling (default: 40)

G2G2G2G avatar Mar 12 '23 12:03 G2G2G2G

AFAIK, there is no top k filtering in the current version. The main code uses the llama_sample_top_p, and not gpt_sample_top_k_top_p which is the only piece of code that actually uses the top_k parameter.

The repetition penalty could maybe be ported to this sampler and used instead?

I've seen multiple people reporting that FB's default sampler is not adequate for comparing LLaMA's outputs with davinci's. Thus, enabling top k filtering could allow people to experiment with and compare different sampling strategies.

Piezoid avatar Mar 12 '23 13:03 Piezoid

It does seem to work much better with these options, based on shawwn's patch: --temp 0.7 --top_k 40 --top_p 0 --repeat_last_n 256 --repeat_penalty 1.1764705882352942

I'm not sure what value is ideal for repeat_last_n, but with a little testing, 256 seems to be enough, while 128 wasn't.

sswam avatar Mar 12 '23 16:03 sswam

--temp 0.7 --top_k 40 --top_p 0 --repeat_last_n 256 --repeat_penalty 1.1764705882352942

llama.cpp with --top_p 0 is greedy inference, picking the highest probability token.

Piezoid avatar Mar 12 '23 17:03 Piezoid

FYI Top K isn't used this PR tho should fix it: https://github.com/ggerganov/llama.cpp/pull/56

beiller avatar Mar 12 '23 20:03 beiller

Need a better strategy to determine default parameters. Single examples do not show anything of value

ggerganov avatar Mar 13 '23 17:03 ggerganov