fromage
fromage copied to clipboard
Replacing OPT LLM with other LLMs
Hi, Thanks alot for your great work. I am evaluating replacing the OPT LLM with other LLMs such as Mistral-7B-v0.1 7B or Phi-3-mini-4k-instruct. I had to make minor code modifications to support these models- mainly adding [PAD] token to their tokenizers. However, the training is not stable (many nan in training loss) and accurarcy results are much worst than original OPT 6.7B model. Do you have any suggestion on why this happens? and if so, how can it be fixed? Thanks in advanced, Ofer