tokenizers
tokenizers copied to clipboard
if split_special_tokens==True,fast_tokenizer is slower than slow_tokenizer
from transformers import LlamaTokenizer, LlamaTokenizerFast
import time
tokenizer1 = LlamaTokenizer.from_pretrained("./Llama-2-7b-chat-hf", split_special_tokens=True) # LlamaTokenizer
tokenizer2 = LlamaTokenizerFast.from_pretrained("./Llama-2-7b-chat-hf", split_special_tokens=True) # LlamaTokenizer
print(tokenizer1, tokenizer2)
s_time = time.time()
for i in range(1000):
tokenizer1.tokenize("你好,where are you?"*100)
print(f"slow: {time.time() - s_time}")
s_time = time.time()
for i in range(1000):
tokenizer2.tokenize("你好,where are you?"*100)
print(f"fast: {time.time() - s_time}")
output: slow: 0.6021890640258789 fast: 0.7353882789611816
If I use * 1000 instead of * 100 this is what I get on my small machine:
slow: 7.805477857589722
fast: 7.280818223953247
In general we don't look too heavily into micro benchmarks (unless it's a 10x), They don't usually tell a super compelling story. For instance you could be using batch tokenization which should be much faster on Fast here.