Use as much context as possible without exceeding max context window
Attempts to address the 2nd half of Issue #133
- Go in reverse order to get previous instructions and responses
- Continue building the full prompt until we exceed the context window size
- Use the version of the prompt that does not exceed the context window size
@nsarrazin I'm not sure exactly how to calculate the token size. I just used what I understood, but if there's a helper method or better way to understand token size please let me know. This worked in my testing. There might be a more efficient or cleaner way of doing it, but I think the logic should hold.
As a result of this change I was able to continue a long conversation without encountering the same bug as in #133
Hi
Is this going to be merged? I constantly need to delete my chats to be able to add prompts more than once in a chat.
I tried this PR but I'm still getting the same issue and some of my questions appear and disappear in a weird fashion :/
I will try to refactor the backend of serge to use the python bindings since they expose a tokenize method we could use that to get a definitive answer on how many tokens we have and cut appropriately.
I also tried this PR and encountered the same issue (albeit maybe less frequently.)
@nsarrazin Ah, I see. This is likely due to guessing the number of tokens instead of getting the actual number of tokens.
Does your other PR #143 supersede this change? If this is still a relevant change to make, after you refactor to be able to use tokenize I can take another stab at this PR.
Thanks for confirming @kindrowboat