tiktoken
tiktoken copied to clipboard
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Starting on January 2nd 2025 we started noticing errors in our logs that we were over the context limit when creating text-embedding-3-large embeddings on openai. I believe there may have...
Hi, we are using tiktoken 0.5.2 to calculate the token count. It is working most of the time, but we are seeing the error: RuntimeError(StackOverflow) with one of the runs....
https://openai.com/index/gpt-4-1/
At this point, it is just wrong for OpenAI to release models without updating `tiktoken` at the same time.
Hi TikToken team! 👋 I wanted to share a community resource that might be helpful for TikToken users who also work with HuggingFace tokenizers. I've created AutoTikTokenizer, a lightweight library...
### Summary This PR replaces the use of `hashlib.sha1` with `hashlib.sha256` in `read_file_cached()`. ### Motivation While SHA-1 is used here only for generating a deterministic cache key (not cryptographic operations),...
Adds for the `o4-` to the `MODEL_PREFIX_TO_ENCODING` dictionary and `o4` to `MODEL_TO_ENCODING`. `4.1` has been added in [another PR ](https://github.com/openai/tiktoken/pull/396)
Replaced the hardcoded URL `https://openaipublic.blob.core.windows.net` with the `TIKTOKEN_BPE_HOST` environment variable, allowing for flexibility in sourcing BPE data. This change is particularly beneficial for environments where external access is restricted, such...
https://openai.com/index/introducing-gpt-4-5/
Hi all, I am now attempting to deploy an app that utilises TikToken's encoding. I try-except tik-token's encoding_for_model method with an Azure OpenAI model and if that doesn't work I...