gpt4all Stop hard coding the context size and use the correct size per model

System Info

Currently we hard code the context window size. We need to use the correct context window size as defined by the model.

Information

[ ] The official example notebooks/scripts
[ ] My own modified scripts

Reproduction

Notice the context size is hardcode in source control

Expected behavior

The context size should be per model defined and we should honor it

Nov 21 '23 16:11 manyoso

I have been experimenting with setting different context sizes at https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-backend/llamamodel.cpp#L137C31-L137C31 and I think it would indeed be great, if the model automatically determined the correct context size. I just wonder, if there will be problems with the advent of models that have a large context size set by default? Users may fail to load a model, because of limited VRAM/RAM size.

Exposing and retaining the ability to set (max?) context size and giving users the choice to override the default may be reasonable, especially since other apps (e.g. lm studio) provide this option already.

Nov 23 '23 18:11 ThiloteE

The plan right now is to default to 2048 context but allow the user to change it via the UI. If the user requests more context than the model supports, it will warn to console (which is not ideal on Windows since there is no console).

An alternative is to default to the context size supported by the model, but allow overriding it, which is what I originally had in mind. (I wish ooba's TGWUI did this, since it currently enforces the model's context size, and won't let you set it to anything smaller.) The downside to this approach is that the user still has to realize that they may be able to decrease the context size to fit the model in memory.

Nov 23 '23 19:11 cebtenzzre

Use the default context size (n_ctx_train), and check the total memory. If the memory is too small, then decrease the default context size or warn. allow the user to change it via the UI.

Nov 29 '23 01:11 snowyu

I support the suggestion from snowyu and want further suggest to display the amout of GPU VRAM GPT4All can use as well as the maximum context size of the model beside the input field (or slider) for the context size GPT4All should use.

Nov 29 '23 09:11 dlippold

How to change n_ctx in python without a local build?( I mean, there's no file there "/gpt4all-backend/llamamodel.cpp"

Dec 05 '23 00:12 timikrutik

How to change n_ctx in python without a local build?

Currently the only option is to build the python bindings from source. I am working on improving this situation.

Dec 05 '23 21:12 cebtenzzre

How to change n_ctx in python without a local build?

Currently the only option is to build the python bindings from source. I am working on improving this situation.

Thanks for your response)

Dec 05 '23 21:12 timikrutik

Also, for long prompts, why don't we simply keep the prompt, truncate the middle, and add only the final part of the too-long context, then maybe display a small warning that the text was longer than context window

Dec 07 '23 14:12 manscrober

Reopening for visibility until the release comes out.

Dec 29 '23 21:12 cebtenzzre

This is included in the v2.6.1 release.

Jan 17 '24 19:01 cebtenzzre

gpt4all gpt4all copied to clipboard

Stop hard coding the context size and use the correct size per model

System Info

Information

Reproduction

Expected behavior

gpt4all
gpt4all copied to clipboard