LLMZoo
LLMZoo copied to clipboard
Have you extended the vocabulary size of the tokenizer?
Nice work on your repo. I'm curious to see whether you managed to achieve your results in Chinese and multilingual models with or without resizing the vocabulary size of the tokenizer.
Have you extended the vocabulary size of the llama to support the multilanguage training?
Please let me know.
Hi @mohataher,
Thanks! We did not extend the vocabulary size for llama.
Best, Zhihong