llama3 icon indicating copy to clipboard operation
llama3 copied to clipboard

Can I use the transformers.AutoTokenizer to load the tokenizer?

Open tian969 opened this issue 9 months ago • 4 comments

I know the tokenizer.py in this Repo use TikTokenizer, can I use transformers.AutoTokenizer to load the tokenizer so that I dont need to amend my code class? And if i not use tokenizer.py, ChatFormat can not be used too.

tian969 avatar Apr 25 '24 15:04 tian969

I mean transformers.PretrainedTokenizer class

tian969 avatar Apr 25 '24 16:04 tian969

same question

ppaanngggg avatar Apr 26 '24 05:04 ppaanngggg

I find the solution, you should use model files on huggingface. There is a tokenizer.json file can be loaded directly.

ppaanngggg avatar Apr 28 '24 23:04 ppaanngggg

Yes, you can use AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct)

subramen avatar May 01 '24 17:05 subramen