LLMZoo Have you extended the vocabulary size of the tokenizer?

Have you extended the vocabulary size of the tokenizer?

Open mohataher opened this issue 1 year ago • 1 comments

Nice work on your repo. I'm curious to see whether you managed to achieve your results in Chinese and multilingual models with or without resizing the vocabulary size of the tokenizer.

Have you extended the vocabulary size of the llama to support the multilanguage training?

Please let me know.

May 14 '23 10:05 mohataher

Hi @mohataher,

Thanks! We did not extend the vocabulary size for llama.

Best, Zhihong

Jun 04 '23 11:06 zhjohnchan

LLMZoo LLMZoo copied to clipboard

Have you extended the vocabulary size of the tokenizer?

LLMZoo
LLMZoo copied to clipboard