ImageReward icon indicating copy to clipboard operation
ImageReward copied to clipboard

Training ImageReward model on different budgets

Open bhattg opened this issue 2 years ago • 5 comments

Hi! The paper mentions that the training for the ImageReward model is not easy and is sensitive to hyperparameters. In the section about hyperparameters, it says --" We find that fixing 70% of transformer layers with a learning rate of 1e-5 and batch size of 64 can reach up to the best preference accuracy."

Is this for the 8k budget? Can we get the suitable hyperparams for the other budgets?

Secondly, which part of the code freezes the transformer layers? Thanks!

bhattg avatar Sep 01 '23 02:09 bhattg

Thanks for your discussion! Firstly, the hyperparameters are for 8k budget (but the shuffle may differ, so it's worth trying a little bit different ones). Secondly, see https://github.com/THUDM/ImageReward/blob/main/train/src/ImageReward.py#L87-L99.

xujz18 avatar Sep 01 '23 16:09 xujz18

Thank you very much!

bhattg avatar Sep 03 '23 22:09 bhattg

Hey will it be possible to provide the hyperparams for 1k and 4k settings as well? That will be very useful.

bhattg avatar Sep 04 '23 20:09 bhattg

The 8k hyper-parameters should only need smaller adjustments to accommodate 1k/2k/4k.

xujz18 avatar Sep 06 '23 08:09 xujz18

Thanks! In your experience which hyperparameters were the most sensitive? I will try to tune them.

bhattg avatar Sep 11 '23 01:09 bhattg