LLMZoo icon indicating copy to clipboard operation
LLMZoo copied to clipboard

About the tokenizer

Open yuyq96 opened this issue 3 years ago • 2 comments

Why the tokenizer class specified in tokenizer config is BloomTokenizer? There is only BloomTokenizerFast in transformers implementation.

yuyq96 avatar Apr 19 '23 01:04 yuyq96

Hi @yuyq96,

Thanks for your attention!

Could you upgrade the transformers version: pip install git+https://github.com/huggingface/transformers and try again?

Best, Zhihong

zhjohnchan avatar Apr 19 '23 02:04 zhjohnchan

There is no BloomTokenizer even in the latest transformers. https://github.com/huggingface/transformers/tree/main/src/transformers/models/bloom

yuyq96 avatar Apr 19 '23 07:04 yuyq96

Hi yuyq96,

I got the same issue. Got any solution? Thanks in advance.

There is no BloomTokenizer even in the latest transformers. https://github.com/huggingface/transformers/tree/main/src/transformers/models/bloom

dickchanym avatar May 01 '23 04:05 dickchanym

@dickchanym I manually switched to BloomTokenizerFast and it works fine.

yuyq96 avatar May 04 '23 08:05 yuyq96