llama
llama copied to clipboard
llama2 - loss declines too slowly
Hi everyone,
I am fine-tuning the llama2, but the loss is declining very slowly, and I am a little confused about the reason. Prior to this, I had fine-tuned the llama1 and the loss dropped significantly at that time
The picture below is the loss decline curve I was fine-tuning the llama2.
I hope scholars who have similar problems and know how to solve them can give me some suggestions.
Thanks!!!!!!
I'm not sure what dataset you are training on or what hardware you're using but what I see here is a pretty decent gradual slope that probably has a way to go.
Your loss curve is influenced by a lot of things that may be different from your llama1 training... learning rate/scheduler, data set, seed, batch size, trainer, peft/bitsandbytes configuration, etc..
On Fri, Jul 28, 2023 at 11:22 AM Song Yun-Ze @.***> wrote:
Hi everyone, I am fine-tuning the llama2, but the loss is declining very slowly, and I am a little confused about the reason. When I used the llama1 to fine-tune instructions, the loss dropped very quickly.
The picture below is the loss decline curve I was training.
I hope scholars who have similar problems and know how to solve them can give me some suggestions.
[image: W B Chart 2023_7_28 23_20_34] https://user-images.githubusercontent.com/129425169/256863237-d80be3c3-c9bf-437a-9b42-556c13945769.png
Thanks!!!!!!
— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/llama/issues/578, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNELIUAN6UTHQ54XO2W5CDXSPKLLANCNFSM6AAAAAA23URB2E . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Have you tried a higher learning rate? I'm noticing the slow loss decline myself here fine-tuning Llama2-7b, comparing to when I fine-tuned Falcon-7b...
Why am I training the LLAMA7B model to start training NaN
@wangzhonghai Hi, have you solved this problem yet?
I found the same problem when trying to peft fine-tune CodeLLama-7B (using LlamaForSequenceClassification), the Loss is always Nan during the fine-tuning.
Thanks!