ollama Llama 3 BPE tokenization needs improvement

Llama 3 BPE tokenization needs improvement

Open coder543 opened this issue 10 months ago • 1 comments

What is the issue?

This PR just merged on llama.cpp, which contained important improvements to how tokenization worked for Llama 3 and other models. An example of the issue is noted here.

Hopefully ollama can update to the latest llama.cpp quickly and make a new release.

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

all versions up to this point

Apr 29 '24 14:04 coder543

You might want to wait for https://github.com/ggerganov/llama.cpp/pull/6965 to be merged, too (should happen soon).

Apr 29 '24 14:04 MoonRide303

https://github.com/ggerganov/llama.cpp/pull/6965 has been merged now. I'm unclear when things were fixed in ollama, but I just tested with 0.1.35, and I can't reproduce it anymore. Closing.

May 11 '24 13:05 coder543

The llama.cpp commit link in ollama is dated 4/30 and https://github.com/ggerganov/llama.cpp/pull/6965 was merged to llama.cpp on 5/9. So, it doesn't look like this merge was included with the last 0.1.37 ollama release. Does that mean ollama was changed to handle the previous llama.cpp behavior and a future llama.cpp sync in ollama will change behavior?

May 13 '24 15:05 dpublic

ollama ollama copied to clipboard

Llama 3 BPE tokenization needs improvement

What is the issue?

OS

GPU

CPU

Ollama version

ollama
ollama copied to clipboard