gpt4all
                                
                                 gpt4all copied to clipboard
                                
                                    gpt4all copied to clipboard
                            
                            
                            
                        Is context window size limited to 2k tokens, regardless of the model used?
It seems that the message "Recalculating context" in the chat (or "LLaMA: reached the end of the context window so resizing" during API calls) appears after 2k tokens, regardless of the model used.
When that happens, the models indeed forget the content that preceded the current context window. So e.g. they are unable to answer multiple questions about a chunk of text, as questions and answers add to the conversation and eventually push this chunk outside of the context window. From that point on, the answers become 100% hallucinations.
There are many models now that advertise large context window sizes - Yi LLMs in particular. However, from what I've tried, none of that seems to work in GPT4All: "Recalculating context" always appears at the 2k mark. Is it really supposed to be this way?
(Note this is not the same issue as #1638. I'd argue it is actually worse. Prompt size limitations could be worked around by chunking the input. Unfortunately, since the context window size is the same as max prompt size, chunking the input doesn't help at all.)
The 2k token context size was hard coded independently of the used model until recently. Now it is fixed, see https://github.com/nomic-ai/gpt4all/commit/d1c56b8b28a7239f0ec0c3e07b0745cc527beeb5 and the bug reports #1749 and #1668. But that fix is not yet in the current release (there was not new release after the fix).
I temporarily reopened #1668 for visibility.
I put 200000 tokens in gpt cli program but I still get that eerror often and probably way beforee 200000!