llama.cpp Changing default repeat_last_n value to current context size?

I noticed that llama 7b almost always gets stuck in a loop after a certain amount of time. This problem has reoccurred to me throughout the all time I have been trying to use llama.cpp (since March 15). I have also tried different models such as alpaca and gpt4all unfiltered, but the problem remains still. It also becomes obvious when you try to generate a dialog following some kind of plot (I use --keep to keep the plot summary in context). All the times I've tried to generate something infinite, it just loops at some point, even in interactive mode.

I also noticed, that setting repeat_last_n to current context size helps to eliminate this issue. (I use ctx_size 2048 for the most time)

Maybe after some testing, default repeat_last_n value could be changed to currently set context size, so newbies could bypass this issue?

Apr 05 '23 18:04 rmn20

If I understand correctly, repeat_last_n works by avoiding generating any of the tokens (words) in the last n, so if you set it as high as something like 2048, the text will be coherent at first but rapidly devolve into flowery and the nonsensical speech as the models looks for new tokens that haven't been used yet.

This is how it works in theory, and I've seen the same experience in practice as well. It certainly makes for an 'interesting' dialog / chat / whatever when you set it so high but good luck making any sense of the answers if they're more than a couple of paragraphs.

Apr 10 '23 00:04 dogjamboree

If I understand correctly, repeat_last_n works by avoiding generating any of the tokens (words) in the last n, so if you set it as high as something like 2048, the text will be coherent at first but rapidly devolve into flowery and the nonsensical speech as the models looks for new tokens that haven't been used yet.

This is the issue I wanted to address with #331.

@rmn20 Can you try this branch? On this branch, setting --repeat_half_life 32 will detect repeats over the whole context, but recent and long repeats are penalized more strongly that old and shorter ones.

Apr 10 '23 17:04 Piezoid

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 11 '24 01:04 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

Changing default repeat_last_n value to current context size?

llama.cpp
llama.cpp copied to clipboard