LLMZoo icon indicating copy to clipboard operation
LLMZoo copied to clipboard

Have you extended the vocabulary size of the tokenizer?

Open mohataher opened this issue 1 year ago • 1 comments

Nice work on your repo. I'm curious to see whether you managed to achieve your results in Chinese and multilingual models with or without resizing the vocabulary size of the tokenizer.

Have you extended the vocabulary size of the llama to support the multilanguage training?

Please let me know.

mohataher avatar May 14 '23 10:05 mohataher

Hi @mohataher,

Thanks! We did not extend the vocabulary size for llama.

Best, Zhihong

zhjohnchan avatar Jun 04 '23 11:06 zhjohnchan