llama.cpp
llama.cpp copied to clipboard
EoS Tokenization issue for Nemo 12b
Name and Version
llama.cpp-b3999
Operating systems
Windows
GGML backends
CUDA
Hardware
2x RTX 3090 i7-7820X
Models
cgato/Nemo-12b-Humanize-KTO-v0.1
bartowski/Nemo-12b-Humanize-KTO-v0.1-GGUF
Problem description & steps to reproduce
When tokenizing the end token with Nemo, there is an observed deviation from how HuggingFace transformers handles tokenization. The End token for ChatML is broken up rather than being tokenized whole which degrades model performance.
This issue does not exist for HF Transformers or ExLlama2 ( example of expected tokenization below. )
This should be reproducible with any Nemo based model which uses a prompt format like ChatML which has EoS tokens inside the prompt. I'm unsure if it is Nemo specific or not.
First Bad Commit
No response
Relevant log output
Relevant broken tokenization.
