LLMs-from-scratch icon indicating copy to clipboard operation
LLMs-from-scratch copied to clipboard

Llama 3 tokenizer - special tokens instance

Open d-kleine opened this issue 4 months ago • 0 comments

Bug description

I noticed something small while looking at the Llama 3 tokenizer code and thought it might be helpful to mention:

https://github.com/rasbt/LLMs-from-scratch/blob/ece59ba58768db7b34d9b5d5f88677de8c1e84ea/pkg/llms_from_scratch/llama3.py#L315-L316

and

https://github.com/rasbt/LLMs-from-scratch/blob/ece59ba58768db7b34d9b5d5f88677de8c1e84ea/pkg/llms_from_scratch/llama3.py#L325-L326

In VS Code, the self.special_tokens instance appears greyed out for me, and after checking, I realized that it isn’t actually defined as an instance attribute anywhere in the class. I believe it should actually refer to self.special instead.

Edit: The whole encode function in llama3.py

https://github.com/rasbt/LLMs-from-scratch/blob/ece59ba58768db7b34d9b5d5f88677de8c1e84ea/pkg/llms_from_scratch/llama3.py#L312-L331

differs from the one in the converting-llama2-to-llama3.ipynb and standalone-llama32.ipynb notebooks:

https://github.com/rasbt/LLMs-from-scratch/blob/58b8672452248733a182c5669843bf097072317c/ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb#L1136-L1144

What operating system are you using?

None

Where do you run your code?

None

Environment




d-kleine avatar Jun 17 '25 17:06 d-kleine