llama icon indicating copy to clipboard operation
llama copied to clipboard

llama2 - loss declines too slowly

Open YunzeSong opened this issue 2 years ago • 5 comments

Hi everyone,

I am fine-tuning the llama2, but the loss is declining very slowly, and I am a little confused about the reason. Prior to this, I had fine-tuned the llama1 and the loss dropped significantly at that time

The picture below is the loss decline curve I was fine-tuning the llama2.

I hope scholars who have similar problems and know how to solve them can give me some suggestions.

W B Chart 2023_7_28 23_20_34

Thanks!!!!!!

YunzeSong avatar Jul 28 '23 15:07 YunzeSong

I'm not sure what dataset you are training on or what hardware you're using but what I see here is a pretty decent gradual slope that probably has a way to go.

Your loss curve is influenced by a lot of things that may be different from your llama1 training... learning rate/scheduler, data set, seed, batch size, trainer, peft/bitsandbytes configuration, etc..

On Fri, Jul 28, 2023 at 11:22 AM Song Yun-Ze @.***> wrote:

Hi everyone, I am fine-tuning the llama2, but the loss is declining very slowly, and I am a little confused about the reason. When I used the llama1 to fine-tune instructions, the loss dropped very quickly.

The picture below is the loss decline curve I was training.

I hope scholars who have similar problems and know how to solve them can give me some suggestions.

[image: W B Chart 2023_7_28 23_20_34] https://user-images.githubusercontent.com/129425169/256863237-d80be3c3-c9bf-437a-9b42-556c13945769.png

Thanks!!!!!!

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/llama/issues/578, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNELIUAN6UTHQ54XO2W5CDXSPKLLANCNFSM6AAAAAA23URB2E . You are receiving this because you are subscribed to this thread.Message ID: @.***>

binaryninja avatar Jul 28 '23 15:07 binaryninja

Have you tried a higher learning rate? I'm noticing the slow loss decline myself here fine-tuning Llama2-7b, comparing to when I fine-tuned Falcon-7b...

Jasonli1997 avatar Aug 01 '23 05:08 Jasonli1997

Why am I training the LLAMA7B model to start training NaN

wangzhonghai avatar Aug 08 '23 06:08 wangzhonghai

@wangzhonghai Hi, have you solved this problem yet?

I found the same problem when trying to peft fine-tune CodeLLama-7B (using LlamaForSequenceClassification), the Loss is always Nan during the fine-tuning.

Thanks!

sssszh avatar Apr 03 '24 14:04 sssszh