continue
continue copied to clipboard
Differences between GPT4 and Llama tokenizer leads to mismatch in token count during prompt pruning.
Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the Continue Discord for questions
- [X] I'm not able to find an open issue that reports the same bug
- [X] I've seen the troubleshooting guide on the Continue Docs
Relevant environment info
- OS: Windows 11 Pro
- Continue: v0.8.12
- IDE: VS Code
Model: Codellama 70b (Free Trial), or any Llama-base models that rely on the the Llama tokenizer.
Description
Current methods to estimate prompt token count use the GPT-4 tokenizer. However, Llama-based models use a different tokenizer. This difference leads to mismatch between estimated token count in Continue versus what is actually received by the model. The Llama tokenizer consistently produced ~30% more tokens than the GPT-4 tokenizer. Depending on the configuration of the LLM server, this can lead to inference errors from the prompt exceeding the maximum number of allowable tokens. See this Discord discussion on the topic.
Differences between the GPT-4 tokenizer and Llama tokenizer can be explored using these links: https://platform.openai.com/tokenizer https://belladoreai.github.io/llama-tokenizer-js/example-demo/build/
Current token count estimation is done around here: https://github.com/continuedev/continue/blob/c8d793ec4599b954c4ec41fe4187d8e676e0b048/core/llm/countTokens.ts#L12
@sestinj has identified a js llama tokenizer that may be worth exploring: https://github.com/belladoreai/llama-tokenizer-js.
To reproduce
- Select the Codellama (Free Trial) option from list of default models (or any Llama-based model)
- Determine max prompt token length for that model
- Create a prompt that that size or knowingly exceeds it. To produce this I usually pass in a whole C++ source code file as reference via the "@" prompt operator.
- Submit prompt to LLM and observe error resulting in token count differences despite prompt pruning being performed by Continue
Log output
Continue error: HTTP 500 Internal Server Error from https://node-proxy-server-blue-l6vsfbzhba-uw.a.run.app/stream_complete
Error in Continue free trial server: 403 Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4097. Given: 5437 `inputs` tokens and 1024 `max_new_tokens`
@rastna12 this is now available in pre-release. Here's the commit that did it: https://github.com/continuedev/continue/commit/e8bbdc06a192a9d6576b7019a164393c16019306
Let me know how it looks and I'll wait to close the issue until you've verified
I think this has been resolved given conversations in Discord. If I'm mistaken please re-open!