Michael Partheil

Results 2 comments of Michael Partheil

I just ran a short benchmark, on my machine it is 47x faster for encoding than the Rust-based `CLIPTokenizerFast` implementation from `transformers`: ```python from transformers import CLIPTokenizerFast from instant_clip_tokenizer import...

> Have you tried batch tokenization? We mostly care about tokenization performance for single inputs (we use it for inference). Nevertheless, we provide a `tokenize_batch` method which is around 3x...