keras-nlp The BytePairTokenizer class is extremely, extremely slow at tokenizing

The BytePairTokenizer class is extremely, extremely slow at tokenizing

Open chenying99 opened this issue 9 months ago • 4 comments

vocabulary size 6400

text = "Are you OK? "
start = time.time()
for i in range(10):
    tokenizer.tokenize(text + str(i))

   
end = time.time()
print(end - start)

3.8366940021514893 seconds

Jan 23 '25 21:01 chenying99