tiktoken icon indicating copy to clipboard operation
tiktoken copied to clipboard

Add GPT-5.1 with the o200k_base encoding in `tiktoken/model.py`

Open eshaanpathak opened this issue 2 months ago • 3 comments

Similar to this PR, we need to add GPT-5.1 to tiktoken/model.py. I don't seem to have access to create a PR to do this, so I am creating an Issue here.

More specifically, we need to add "gpt-5.1-" : "o200k_base" to MODEL_PREFIX_TO_ENCODING and "gpt-5.1" : "o200k_base" to MODEL_TO_ENCODING. This otherwise leads to a KeyError in encoding_name_for_model().

eshaanpathak avatar Nov 13 '25 23:11 eshaanpathak

This is blocking us from updating to gpt-5.1 for chat agents

Valdegg avatar Nov 14 '25 11:11 Valdegg

Is there any public documentation confirming that the 5.1 model uses the o200k_base encoding, similar to the 5 model? I expect this to be the case, but I haven’t found anything that explicitly states it.

tarekgh avatar Nov 18 '25 19:11 tarekgh

Funny enough, Perplexity refers to tiktoken that both use the same encoding, in particular this very issue 😄 https://www.perplexity.ai/search/does-gpt-5-1-still-use-the-sam-345OWLZAQ7yWrv_LrEKEpw#0

Yes, GPT 5.1 continues to use the same "o200k_base" encoding as its predecessor GPT 5. Specifically, GPT-5.1 is listed with the encoding identifier "o200k_base" in model-to-encoding mappings in official libraries like tiktoken. This encoding is a continuation of the byte-level BPE-style tokenization lineage used in GPT-5, optimized for chat and tool use, and no distinct new encoding name has been published for GPT 5.1 separate from the "o200k" family. Thus, for practical and engineering purposes, GPT-5.1 remains on the same encoding as GPT-5.

https://platform.openai.com/docs/guides/latest-model#migrating-from-other-models-to-gpt-5-1

While the model should be close to a drop-in replacement for GPT-5, there are a few key changes to call out.

reneleonhardt avatar Nov 18 '25 20:11 reneleonhardt