chatcraft.org
chatcraft.org copied to clipboard
Floating context window for infinite chat
I want to be able to chat beyond the current model's context window token limit. Currently, when you hit 4K (ChatGPT) or 8K (GPT-4), you get an error saying that you've reached the max limit for the messages.
When the user clicks Send, we take the current chat set of messages, prefix it with a system prompt, and send it to the API.
This is the right move when the total number of tokens in these messages is lower than the max for the model. But when it exceeds that limit, we should apply some intelligence and trim the messages so that it can still be sent.
I think there are various things we could do here, and likely some combination of the following:
- remove the oldest messages
- remove all AI messages, letting the human replies convey enough meaning to give context
- remove all prose from older messages (i.e., just keep the code blocks)
I wrote about this in my blog and @tarasglek added another thought.
I'm sure there are other things we could do here to compress the messages. GPT-4 suggested these:
- Summarize long messages: If a message is too long, you can try to summarize it before removing it entirely. This can be done using a text summarization library or API. This way, you can retain the essence of the message while reducing its token count.
- Remove less relevant AI messages: Instead of removing the oldest AI messages, you can remove AI messages that are less relevant to the current context. You can determine relevance by analyzing the content of the messages and comparing it to the most recent user input.
- Remove redundant information: If there are repeated or similar messages, you can remove the redundant ones to save tokens while preserving context.
- Compress code blocks: If a code block can be compressed without losing readability (e.g., by removing unnecessary whitespace or shortening variable names), you can do so to save tokens.
- Prioritize important keywords: Analyze the messages to identify important keywords or phrases that are crucial to the context. Make sure to retain messages containing these keywords while trimming others.
What's a good starting algorithm to use in order to process a set of messages so it fits the current context window size?
https://github.com/openai/chatgpt-retrieval-plugin seems relevant
We can start with some simple like LMStudio's options:
@DukeManh sounds good. I wonder if we should also try removing all AI messages except the last as a fourth option, and combine that with the rolling window idea?
Two other ideas:
- give the option to bump up to a larger context model (might be hard, since we have no knowledge of the models we use)
- do some kind of summarization pass (e.g., get AI to do it) over the previous chat and include that somehow