tiktoken
tiktoken copied to clipboard
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Does anybody saw this behavior? What could be the reason? What are my alternatives? I'm in western Europe and https://openai.com is available 
For instance there are bug reports from users trying to run software in offline only mode, but because those libraries use tiktoken and it goes out to download vocab files,...
I need to be able to call tiktoken in an environment where outbound traffic needs to be routed through a specific proxy. This is not possible with the current implementation...
While EC2 is ok. On Amazon Lambda I get this error: Traceback (most recent call last): File "/var/task/tiktoken/registry.py", line 34, in _find_constructors constructors = mod.ENCODING_CONSTRUCTORS AttributeError: module 'tiktoken_ext.__pycache__' has no...
Dear Developers, I'm pleased to inform you that I have completed the documentation update the load, model and registry files. The updated documentation provides clear explanations of function parameters, return...
I am using tiktoken in my project but when I deployed my project into AWS lambda I got the below error. Unknown encoding cl100k_base. Plugins found: ['tiktoken_ext.__pycache__', 'tiktoken_ext.openai_public']. how to...
Fixes the crash in https://github.com/openai/tiktoken/issues/245 by prohibiting the regex engine from backtracking catastrophically via [possessive quantifiers](https://www.regular-expressions.info/possessive.html). Interestingly these possesives make the encoding a lot faster again in `fancy-regex`. Before this...
Hi, I'm getting a panic when trying to encode the attached file with the gpt-4 tokenizer. This is from the AMPS dataset that was published along with the [MATH dataset](https://github.com/hendrycks/math)....
I've noticed that Tiktoken is really slow for strings of repeated characters like `"a" * 100_000`. Interestingly, when you add spaces, like `"a " * 50_000`, the performance is orders...
Currently Tiktoken (and with it all the OpenAI related python libraries using it) cannot be installed on systems and platforms that cannot (or are forbidden to) install Rust. This is...