LLaMA-Pro icon indicating copy to clipboard operation
LLaMA-Pro copied to clipboard

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability?

Open hzgdeerHo opened this issue 1 year ago • 8 comments
trafficstars

After finetuned the llama-3-8B-instruct with the same configuration ,as the code from:https://github.com/hiyouga/LLaMA-Factory/tree/3df986c6793a51ec2cb5f31fd1808cd3a9883bc4/examples/extrasexamples/extras/llama_pro always leads to apparent loss of original ability? I only used the train datasets "Identity". Can you help? THANKS

hzgdeerHo avatar May 17 '24 14:05 hzgdeerHo

The final training loss is about 0.1-0.05 ,and I think it is might not be caused by overfitting ?

hzgdeerHo avatar May 17 '24 14:05 hzgdeerHo

Hi! Have you tried to directly finetune llama-3-8B-instruct? What will happen in this setting? I did not carry out the experiments with llama-3 so maybe I am not very familiar with the feature of it. I think you can also try to change the position of the added blocks. Recent Yi-tech report and some llama3-120B models show that maybe fix the first few layers are important. Hope this will be helpful!

hills-code avatar May 18 '24 02:05 hills-code

OK,thanks! Could you show me some link as reference to figure out the problem?

hzgdeerHo avatar May 18 '24 03:05 hzgdeerHo

Certainly! Here is the link to Yi-9B https://huggingface.co/01-ai/Yi-9B and its tech report https://arxiv.org/pdf/2403.04652 You can find the depth upscaling in the Sec 7.3 image and LLaMa3-120B https://huggingface.co/alpindale/goliath-120b

hills-code avatar May 18 '24 04:05 hills-code

Thanks !

hzgdeerHo avatar May 18 '24 04:05 hzgdeerHo

I have post this new issue :https://github.com/hiyouga/LLaMA-Factory/issues/3811 . Would you please help to explain ? Thanks!

hzgdeerHo avatar May 19 '24 14:05 hzgdeerHo

Using small datasets and large epochs in training can easily lead to overfitting.

hiyouga avatar May 19 '24 15:05 hiyouga

Thanks!

hzgdeerHo avatar May 20 '24 00:05 hzgdeerHo