gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

Is context window size limited to 2k tokens, regardless of the model used?

Open brankoradovanovic-mcom opened this issue 1 year ago • 1 comments

It seems that the message "Recalculating context" in the chat (or "LLaMA: reached the end of the context window so resizing" during API calls) appears after 2k tokens, regardless of the model used.

When that happens, the models indeed forget the content that preceded the current context window. So e.g. they are unable to answer multiple questions about a chunk of text, as questions and answers add to the conversation and eventually push this chunk outside of the context window. From that point on, the answers become 100% hallucinations.

There are many models now that advertise large context window sizes - Yi LLMs in particular. However, from what I've tried, none of that seems to work in GPT4All: "Recalculating context" always appears at the 2k mark. Is it really supposed to be this way?

(Note this is not the same issue as #1638. I'd argue it is actually worse. Prompt size limitations could be worked around by chunking the input. Unfortunately, since the context window size is the same as max prompt size, chunking the input doesn't help at all.)

brankoradovanovic-mcom avatar Dec 26 '23 21:12 brankoradovanovic-mcom

The 2k token context size was hard coded independently of the used model until recently. Now it is fixed, see https://github.com/nomic-ai/gpt4all/commit/d1c56b8b28a7239f0ec0c3e07b0745cc527beeb5 and the bug reports #1749 and #1668. But that fix is not yet in the current release (there was not new release after the fix).

dlippold avatar Dec 29 '23 00:12 dlippold

I temporarily reopened #1668 for visibility.

cebtenzzre avatar Dec 29 '23 21:12 cebtenzzre

I put 200000 tokens in gpt cli program but I still get that eerror often and probably way beforee 200000!

Zibri avatar May 13 '24 03:05 Zibri