tiktoken icon indicating copy to clipboard operation
tiktoken copied to clipboard

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Results 87 tiktoken issues
Sort by recently updated
recently updated
newest added
trafficstars

Currently the BPE_FILE is hardcoded to the the path https://openaipublic.blob.core.windows.net/.... This host path is presents challenge when running in container in AWS VPC environment. It will be great if we...

This is a housekeeping code change suggestion. This project is released under the MIT license as per the `LICENSE` file's contents, however the current metadata notation makes handling that information...

This allow the source to be build on Debian Bookworm using the packages provided from Debian.

Should be a matter of putting `debug = true` in `Cargo.toml`. before: ``` num_threads: 1, num_bytes: 100005605 tiktoken 6603873.890819601 bytes / s huggingface 1668452.742767104 bytes / s webtext encode tiktoken...

## Usage help Check out this awesome tokeniser app https://tiktokenizer.vercel.app/ built by [Diagram](https://diagram.com/)! Check out the [OpenAI cookbook](https://github.com/openai/openai-cookbook)! In particular, the following are great examples of using `tiktoken`: - [How...

Builds on #50 to add ruby bindings. Mostly leaving it here for awareness. It'd be great to merge in some of these refactors and/or publish the rust library so folks...

Minor patch to enable ppc64le wheels. This is no change content-wise :-)

Code example: ```py3 enc = tiktoken.get_encoding("cl100k_base") enc.decode([100256]) ``` Trace: ```py3 thread '' panicked at 'no entry found for key', src[/lib.rs:210:37](https://file+.vscode-resource.vscode-cdn.net/lib.rs:210:37) --------------------------------------------------------------------------- PanicException Traceback (most recent call last) [/var/folders/m9/s4s3bdq96pn3dk13fbgpw6rm0000gn/T/ipykernel_9548/1299473396.py](https://file+.vscode-resource.vscode-cdn.net/var/folders/m9/s4s3bdq96pn3dk13fbgpw6rm0000gn/T/ipykernel_9548/1299473396.py) in 1...

Hey, considering its superiority over SPE tokenizers would you provide some sample/example code to train a tiktoken tokenizer from scratch on a custom dataset also like training BPE/SPE does it...