llama.cpp EoS Tokenization issue for Nemo 12b

EoS Tokenization issue for Nemo 12b

Open Catgat opened this issue 1 month ago • 0 comments

Name and Version

llama.cpp-b3999

Operating systems

Windows

GGML backends

CUDA

Hardware

2x RTX 3090 i7-7820X

Models

cgato/Nemo-12b-Humanize-KTO-v0.1

bartowski/Nemo-12b-Humanize-KTO-v0.1-GGUF

Problem description & steps to reproduce

When tokenizing the end token with Nemo, there is an observed deviation from how HuggingFace transformers handles tokenization. The End token for ChatML is broken up rather than being tokenized whole which degrades model performance.

This issue does not exist for HF Transformers or ExLlama2 ( example of expected tokenization below. )

This should be reproducible with any Nemo based model which uses a prompt format like ChatML which has EoS tokens inside the prompt. I'm unsure if it is Nemo specific or not.

First Bad Commit

No response

Relevant log output

Relevant broken tokenization.
![Image](https://github.com/user-attachments/assets/61ea8b42-b8e0-44be-8404-0ef2d37b26a2)

Jan 19 '25 15:01 Catgat

llama.cpp llama.cpp copied to clipboard

EoS Tokenization issue for Nemo 12b

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

llama.cpp
llama.cpp copied to clipboard