ImageReward icon indicating copy to clipboard operation
ImageReward copied to clipboard

Strange training dynamics for ImageReward model.

Open bhattg opened this issue 2 years ago • 3 comments

Hi! I am trying to train a reward model, and I am confused why in the initial iterations of training the gradients are not changing and neither the loss is changing. Only after some steps does it suddenly change and then learning is completed.

Following is the attached learning dynamics. Screen Shot 2023-10-08 at 6 21 46 PM

bhattg avatar Oct 09 '23 01:10 bhattg

Hello, which version of python and cuda are you using? Thank you.

learn01one avatar Oct 17 '23 02:10 learn01one

This is a very interesting discovery, and I believe it may be related to the learning rate schedule and warm-up settings, although there could be other factors worth exploring.

xujz18 avatar Nov 05 '23 07:11 xujz18

Hello, sorry I couldn't get back with the question on python version 3.10.13 and CUDA 11.7

Experiment was run using torch 1.13.0

Regarding the learning dynamics, I am using the following

--fix_rate 0.7 --lr 1e-05 --lr-decay-style cosine --warmup 0.0 --batch_size 32 --accumulation_steps 1 --epochs 50

bhattg avatar Nov 06 '23 18:11 bhattg