fromage Replacing OPT LLM with other LLMs

Replacing OPT LLM with other LLMs

Open oferidan1 opened this issue 9 months ago • 0 comments

Hi, Thanks alot for your great work. I am evaluating replacing the OPT LLM with other LLMs such as Mistral-7B-v0.1 7B or Phi-3-mini-4k-instruct. I had to make minor code modifications to support these models- mainly adding [PAD] token to their tokenizers. However, the training is not stable (many nan in training loss) and accurarcy results are much worst than original OPT 6.7B model. Do you have any suggestion on why this happens? and if so, how can it be fixed? Thanks in advanced, Ofer

May 23 '24 04:05 oferidan1

fromage fromage copied to clipboard

Replacing OPT LLM with other LLMs

fromage
fromage copied to clipboard