Tokenizer icon indicating copy to clipboard operation
Tokenizer copied to clipboard

Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.

Results 7 Tokenizer issues
Sort by recently updated
recently updated
newest added

Hello, The TS package doesnt work on web browser due to its usage of fs and path, is there any plans to support web browsers? Cheers!

Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. Commits 74b2db2 3.0.3 88f1429 update eslint. lint, fix unit tests. 415d660 Snyk js braces 6838727 (#40) 190510f fix tests, skip 1 test in test/braces.expand...

dependencies

Updated the ITokenizer interface to include two new methods for counting tokens in a string, with options for handling special tokens. Implemented these methods in the TikTokenizer class, adding logic...

Llama3.1's tokenizer is also BPE-based. We already ship this library in VS Code's Copilot Chat extension and it would be great if Llama3.1 was supported so that we do not...

When tokenizing the files in the https://github.com/Kotlin/kotlinx.serialization repo, the `cl100k_base` tokenizer struggled on the following files: - [n_structure_open_array_object.json](https://github.com/Kotlin/kotlinx.serialization/blob/master/formats/json-tests/jvmTest/resources/spec_cases/n_structure_open_array_object.json) took 53.7s to tokenize - [n_structure_100000_opening_arrays.json](https://github.com/Kotlin/kotlinx.serialization/blob/master/formats/json-tests/jvmTest/resources/spec_cases/n_structure_100000_opening_arrays.json) took 6.9s to tokenize While the...