if split_special_tokens==True，fast_tokenizer is slower than slow_tokenizer

Open gongel opened this issue 1 year ago • 1 comments

from transformers import LlamaTokenizer, LlamaTokenizerFast
import time
tokenizer1 = LlamaTokenizer.from_pretrained("./Llama-2-7b-chat-hf", split_special_tokens=True) # LlamaTokenizer
tokenizer2 = LlamaTokenizerFast.from_pretrained("./Llama-2-7b-chat-hf", split_special_tokens=True) # LlamaTokenizer
print(tokenizer1, tokenizer2)

s_time = time.time()
for i in range(1000):
    tokenizer1.tokenize("你好，where are you?"*100)
print(f"slow: {time.time() - s_time}")

s_time = time.time()
for i in range(1000):
    tokenizer2.tokenize("你好，where are you?"*100)
print(f"fast: {time.time() - s_time}")

output: slow: 0.6021890640258789 fast: 0.7353882789611816

Dec 12 '24 10:12 gongel

If I use * 1000 instead of * 100 this is what I get on my small machine:

slow: 7.805477857589722
fast: 7.280818223953247

In general we don't look too heavily into micro benchmarks (unless it's a 10x), They don't usually tell a super compelling story. For instance you could be using batch tokenization which should be much faster on Fast here.

Jan 10 '25 09:01 Narsil