tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Results 407 tokenizers issues
Sort by recently updated
recently updated
newest added

Creating another issue for tokenizers support on alpine: error: ``` error: Cannot find package 'tokenizers-linux-x64-musl' from '/usr/src/app/node_modules/tokenizers/index.js' Bun v1.1.38 (Linux x64 baseline) /usr/src/app # ./mycli 155 | if (isMusl()) {...

Hi, While installing the tokenizers library after 0.21.0 version release, the installation is failing in Unix. (Python 3.9.16) It is looking for a Rust installation. 12:27:55 Looking in indexes: https://pypi.org/simple...

For example, ```python def post_processor(self, token_ids_0, token_ids_1=None): if "cls" in token_ids_0: return processors.TemplateProcessing( single=f"{cls} $A {sep}", pair=f"{cls} $A {sep} $B {cls}", special_tokens=[ (cls, cls_token_id), (sep, sep_token_id), ], ) else: return...

is_pretokenized doesnt seem to be respected in some cases. The same code given below works in 0.20.0 ## Code ```python from tokenizers import Tokenizer, pre_tokenizer from tokenizers.models import WordPiece m...

I can't using this on my X elite PC , platform is arm ,but os is windows.

Hi, I trained a sentencepiece tokenizer with prefix match. After convert to HF tokenizer, the tokenization result is not consistent with slow tokenizer. In sentencepiece, we can choose whether to...

### System Info - `transformers` version: 4.45.2 - Platform: Linux-5.4.0-193-generic-x86_64-with-glibc2.31 - Python version: 3.12.7 - Huggingface_hub version: 0.25.2 - Safetensors version: 0.4.5 - Accelerate version: not installed - Accelerate config:...

bug

A piece is applied with a specific `type_id`, no matter if it is a `SpecialToken` or a `Sequence`. But when the `piece` is a `sequence` with some overflows, it only...

Closes #1906 This adds a wheel build for arm64 windows to `python-release.yml`.

#1869 added arm64 windows tests in [`CI.yml`](https://github.com/huggingface/tokenizers/blob/main/.github/workflows/CI.yml) but didn't add any builds to [`python-release.yml`](https://github.com/huggingface/tokenizers/blob/main/.github/workflows/python-release.yml). Would there be any appetite for adding windows arm64 to the python release workflow and releasing...