LongChat About the learning rate

About the learning rate

Open lucasjinreal opened this issue 1 year ago • 1 comments

from the script provided, I think longchat is full sft rather than lora, but the equal batch size total is just 1 (batch_size * gradient_accum * num_gpus)

But vicuna original fschat training full params sft, using equal batch size of 128, why lr is different? Which one should be adopted if only have 2 80G ?

Jul 14 '23 02:07 lucasjinreal

LongChat LongChat copied to clipboard

About the learning rate

LongChat
LongChat copied to clipboard