gpt4all
gpt4all copied to clipboard
Stop hard coding the context size and use the correct size per model
System Info
Currently we hard code the context window size. We need to use the correct context window size as defined by the model.
Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
Reproduction
Notice the context size is hardcode in source control
Expected behavior
The context size should be per model defined and we should honor it
I have been experimenting with setting different context sizes at https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-backend/llamamodel.cpp#L137C31-L137C31 and I think it would indeed be great, if the model automatically determined the correct context size. I just wonder, if there will be problems with the advent of models that have a large context size set by default? Users may fail to load a model, because of limited VRAM/RAM size.
Exposing and retaining the ability to set (max?) context size and giving users the choice to override the default may be reasonable, especially since other apps (e.g. lm studio) provide this option already.
The plan right now is to default to 2048 context but allow the user to change it via the UI. If the user requests more context than the model supports, it will warn to console (which is not ideal on Windows since there is no console).
An alternative is to default to the context size supported by the model, but allow overriding it, which is what I originally had in mind. (I wish ooba's TGWUI did this, since it currently enforces the model's context size, and won't let you set it to anything smaller.) The downside to this approach is that the user still has to realize that they may be able to decrease the context size to fit the model in memory.
Use the default context size (n_ctx_train
), and check the total memory. If the memory is too small, then decrease the default context size or warn. allow the user to change it via the UI.
I support the suggestion from snowyu and want further suggest to display the amout of GPU VRAM GPT4All can use as well as the maximum context size of the model beside the input field (or slider) for the context size GPT4All should use.
How to change n_ctx in python without a local build?( I mean, there's no file there "/gpt4all-backend/llamamodel.cpp"
How to change n_ctx in python without a local build?
Currently the only option is to build the python bindings from source. I am working on improving this situation.
How to change n_ctx in python without a local build?
Currently the only option is to build the python bindings from source. I am working on improving this situation.
Thanks for your response)
Also, for long prompts, why don't we simply keep the prompt, truncate the middle, and add only the final part of the too-long context, then maybe display a small warning that the text was longer than context window
Reopening for visibility until the release comes out.
This is included in the v2.6.1 release.