tiktoken icon indicating copy to clipboard operation
tiktoken copied to clipboard

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Results 87 tiktoken issues
Sort by recently updated
recently updated
newest added
trafficstars

For some extremely long sequences, the tokenizer can result in a PanicException. Example ```[python] import tiktoken tokenizer = tiktoken.get_encoding("cl100k_base") text = "^" * 1000000 tokenizer.encode(text) # this throws a PanicException...

I made a toy GPT2 tokenizer as a python rust extension. It seems to be slightly faster than tiktoken in my tests. It looks like https://github.com/openai/tiktoken/pull/31 may get most or...

**The following projects are not maintained by OpenAI. I cannot vouch that any of them are correct or safe to use. Use at your own risk.** Note that if a...

It would be nice to add a `__version__` attribute similar to other Python projects, so that we can easily query the version for reproducibility reasons. I.e., ```python import tiktoken print(tiktoken.__version__)...

Would it be possible to add a `wasm` target and make tiktoken available for Node.js projects? I'm currently relying on [gpt-3-encoder](https://github.com/latitudegames/GPT-3-Encoder) but would prefer to use tiktoken for performance reasons.

Split the code rust, for run as python lib and rust lib, to be able to publish in both crates and pypi. Fixes #24

Not really directly useful given the Chat API... But triangulating: - https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb - https://github.com/openai/tiktoken/commit/ec7c121e385bf1675312c6c33734de6b392890c4#diff-0d973848bd229418209db2c46c86167000845592ca6b98fad215c21c317bc494R9 We know they exist.

Add taking into account the env var TIKTOKEN_FORCE_CACHE. When TIKTOKEN_FORCE_CACHE is set to "1", tiktoken will read BPE files only from the local cache. It allows us to have control...

It seems that the tiktoken package is not linkable from Rust using Cargo's default registry. Are there plans to publish the `tiktoken` crate? Is it published on another registry? Thanks...

# Issue When trying to call `encoding_for_model` providing a fine-tuned model as input, the following error occurs: ``` KeyError: 'Could not automatically map davinci:ft-personal:finetunedmodel-2023-05-23-20-00-00 to a tokeniser. Please use `tiktok.get_encoding`...