Tokenizer
Tokenizer copied to clipboard
Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.
Hello, The TS package doesnt work on web browser due to its usage of fs and path, is there any plans to support web browsers? Cheers!
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. Commits 74b2db2 3.0.3 88f1429 update eslint. lint, fix unit tests. 415d660 Snyk js braces 6838727 (#40) 190510f fix tests, skip 1 test in test/braces.expand...
Updated the ITokenizer interface to include two new methods for counting tokens in a string, with options for handling special tokens. Implemented these methods in the TikTokenizer class, adding logic...
Llama3.1's tokenizer is also BPE-based. We already ship this library in VS Code's Copilot Chat extension and it would be great if Llama3.1 was supported so that we do not...
When tokenizing the files in the https://github.com/Kotlin/kotlinx.serialization repo, the `cl100k_base` tokenizer struggled on the following files: - [n_structure_open_array_object.json](https://github.com/Kotlin/kotlinx.serialization/blob/master/formats/json-tests/jvmTest/resources/spec_cases/n_structure_open_array_object.json) took 53.7s to tokenize - [n_structure_100000_opening_arrays.json](https://github.com/Kotlin/kotlinx.serialization/blob/master/formats/json-tests/jvmTest/resources/spec_cases/n_structure_100000_opening_arrays.json) took 6.9s to tokenize While the...