Phi-3CookBook
Phi-3CookBook copied to clipboard
Lora fine tuning phi-3.5 moe
Hi,
I recently fine-tuned the phi-3.5-moe-instruct model and phi-3.5-mini-instruct model using PEFT LORA. It seems the Moe model is performing way worse than 3.5 Mini Are there any specific things that need to be in mind during LORA fine-tuning with a mixture of expert models? And also during fine-tuning for Moe the validation loss is showing as No Log