Michael Partheil comments

Repositories
Issues
Comments

Results 2 comments of


                                            Michael Partheil

Fast alternative to text tokenization with `SimpleTokenizer`

I just ran a short benchmark, on my machine it is 47x faster for encoding than the Rust-based `CLIPTokenizerFast` implementation from `transformers`: ```python from transformers import CLIPTokenizerFast from instant_clip_tokenizer import...

Fast alternative to text tokenization with `SimpleTokenizer`

> Have you tried batch tokenization? We mostly care about tokenization performance for single inputs (we use it for inference). Nevertheless, we provide a `tokenize_batch` method which is around 3x...