ollama
ollama copied to clipboard
Llama 3 BPE tokenization needs improvement
What is the issue?
This PR just merged on llama.cpp, which contained important improvements to how tokenization worked for Llama 3 and other models. An example of the issue is noted here.
Hopefully ollama can update to the latest llama.cpp quickly and make a new release.
OS
Linux
GPU
Nvidia
CPU
AMD
Ollama version
all versions up to this point
You might want to wait for https://github.com/ggerganov/llama.cpp/pull/6965 to be merged, too (should happen soon).
https://github.com/ggerganov/llama.cpp/pull/6965 has been merged now. I'm unclear when things were fixed in ollama, but I just tested with 0.1.35, and I can't reproduce it anymore. Closing.
The llama.cpp commit link in ollama is dated 4/30 and https://github.com/ggerganov/llama.cpp/pull/6965 was merged to llama.cpp on 5/9. So, it doesn't look like this merge was included with the last 0.1.37 ollama release. Does that mean ollama was changed to handle the previous llama.cpp behavior and a future llama.cpp sync in ollama will change behavior?