llama.cpp
llama.cpp copied to clipboard
Maybe lower default temp and switch to top_k 40
Per this twitter thread. See commit here.
--top_k N top-k sampling (default: 40)
AFAIK, there is no top k filtering in the current version. The main code uses the llama_sample_top_p
, and not gpt_sample_top_k_top_p
which is the only piece of code that actually uses the top_k
parameter.
The repetition penalty could maybe be ported to this sampler and used instead?
I've seen multiple people reporting that FB's default sampler is not adequate for comparing LLaMA's outputs with davinci's. Thus, enabling top k filtering could allow people to experiment with and compare different sampling strategies.
It does seem to work much better with these options, based on shawwn's patch: --temp 0.7 --top_k 40 --top_p 0 --repeat_last_n 256 --repeat_penalty 1.1764705882352942
I'm not sure what value is ideal for repeat_last_n, but with a little testing, 256 seems to be enough, while 128 wasn't.
--temp 0.7 --top_k 40 --top_p 0 --repeat_last_n 256 --repeat_penalty 1.1764705882352942
llama.cpp with --top_p 0
is greedy inference, picking the highest probability token.
FYI Top K isn't used this PR tho should fix it: https://github.com/ggerganov/llama.cpp/pull/56
Need a better strategy to determine default parameters. Single examples do not show anything of value