llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

EoS Tokenization issue for Nemo 12b

Open Catgat opened this issue 1 month ago • 0 comments

Name and Version

llama.cpp-b3999

Operating systems

Windows

GGML backends

CUDA

Hardware

2x RTX 3090 i7-7820X

Models

cgato/Nemo-12b-Humanize-KTO-v0.1

bartowski/Nemo-12b-Humanize-KTO-v0.1-GGUF

Problem description & steps to reproduce

When tokenizing the end token with Nemo, there is an observed deviation from how HuggingFace transformers handles tokenization. The End token for ChatML is broken up rather than being tokenized whole which degrades model performance.

Image

This issue does not exist for HF Transformers or ExLlama2 ( example of expected tokenization below. )

Image

This should be reproducible with any Nemo based model which uses a prompt format like ChatML which has EoS tokens inside the prompt. I'm unsure if it is Nemo specific or not.

First Bad Commit

No response

Relevant log output

Relevant broken tokenization.
![Image](https://github.com/user-attachments/assets/61ea8b42-b8e0-44be-8404-0ef2d37b26a2)

Catgat avatar Jan 19 '25 15:01 Catgat