gpt4all Max token limit of 508. Configurable?

I'm running a LangChain ConversationChain with gpt4all .. and my program terminates with the below. Is this an inherent limit or configurable. Is anyone building FlashAttention into gpt4all for a 'wider' attention window? Tx

.... llama_generate: seed = 1681685660

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | llama_generate: error: prompt is too long (1431 tokens, max 508)

Apr 16 '23 23:04 venuv

I was thinking the same thing. I've never fully understood why LLM reseaerchers went with a fixed size for this when the original transformer used sinusoidal attention and achieved SOTA results and yet had no limits.

I asked ChatGPT about it and it explained that the original transformer was optimized for translation tasks as opposed to generative tasks.Thus GPT and it's derivatives are autoregressive, But what it couldn't explain to me was why autoregression was better handled with learned fixed attention.

Apr 17 '23 17:04 devlux76

Here's something insightful.

To modify GPT4All-J to use sinusoidal positional encoding for attention, you would need to modify the model architecture and replace the default positional encoding used in the model with sinusoidal positional encoding.

Sinusoidal positional encoding uses sine and cosine functions with different frequencies to encode the position of each token in the input sequence. This allows the model to attend to relative positions and ensures that the attention mechanism can handle sequences of varying lengths.

Here's a high-level overview of the steps you would need to take:

In the model's source code, locate the file that defines the transformer architecture (this file may contain classes or functions that define the self-attention mechanism and positional encoding).

Modify the positional encoding component of the transformer architecture. If the model uses learnable positional embeddings, you would need to replace them with sinusoidal positional encoding.

Define a function that generates sinusoidal positional encoding for each token position in the sequence. The function should take the sequence length and model dimension as input and return a tensor with sinusoidal values.

Ensure that the new positional encoding is applied to the input tokens before they are passed through the self-attention mechanism.

Retrain the modified model using the training instructions provided in the GPT4All-J repository1 Favicon GitHub - nomic-ai/gpt4all: gpt4all: an ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue . It's important to note that modifying the model architecture would require retraining the model with the new encoding, as the learned weights of the original model may not be directly transferable to the modified model.

Please note that the specific details and file names may vary based on the implementation of GPT4All-J, and further investigation of the source code is required to determine the exact locations and modifications needed to achieve the desired change.

Apr 17 '23 17:04 devlux76

Seems solved

Aug 11 '23 11:08 niansa

gpt4all gpt4all copied to clipboard

Max token limit of 508. Configurable?

gpt4all
gpt4all copied to clipboard