huggingface-tokenizer-in-cxx icon indicating copy to clipboard operation
huggingface-tokenizer-in-cxx copied to clipboard

seems to be slow

Open shenfe opened this issue 1 year ago • 3 comments

Good work~ But I ran some tests and found this c++ implementation seems to be slow. Less than 10 tokens per millisecond. Any more tests or findings?

shenfe avatar Mar 25 '23 22:03 shenfe

I change it to use instead of RE2. Speed is normal now.

shenfe avatar Mar 26 '23 15:03 shenfe

Instead of re2, what are you using?

wangkuiyi avatar Mar 26 '23 18:03 wangkuiyi

IMG_0553 IMG_0554

i think human trafficking activities are related to this. sometimes you can see it make mistakes revealing that web content is mutated. i wonder if shenfe sees the dropped ”regex” word in this issue thread.

xloem avatar May 30 '24 03:05 xloem