gpt4all-chat
gpt4all-chat copied to clipboard
Add compatibility with new sampling algorithms in llama.cpp
Title: Add compatibility with new sampling algorithms in llama.cpp
Description: This pull request addresses issue https://github.com/nomic-ai/gpt4all-chat/issues/200#issue-1689677866 by adding compatibility with new sampling algorithms in llama.cpp.
Changes:
Implemented temperature sampling with repetition penalty as an alternative to the previous llama_sample_top_p_top_k sampling
method.
// Temperature sampling with repetition_penalty
llama_sample_repetition_penalty(
d_ptr->ctx, &candidates_data,
promptCtx.tokens.data() + promptCtx.n_ctx - promptCtx.repeat_last_n, promptCtx.repeat_last_n,
promptCtx.repeat_penalty);
llama_sample_top_k(d_ptr->ctx, &candidates_data, promptCtx.top_k);
llama_sample_top_p(d_ptr->ctx, &candidates_data, promptCtx.top_p);
llama_sample_temperature(d_ptr->ctx, &candidates_data, promptCtx.temp);
llama_token id = llama_sample_token(d_ptr->ctx, &candidates_data);
I will look at this, but will need to update the submodule at the same time otherwise this will break. But this helps a ton! Thanks @kuvaus !