ImageReward Training ImageReward model on different budgets

Training ImageReward model on different budgets

Open bhattg opened this issue 2 years ago • 5 comments

Hi! The paper mentions that the training for the ImageReward model is not easy and is sensitive to hyperparameters. In the section about hyperparameters, it says --" We find that fixing 70% of transformer layers with a learning rate of 1e-5 and batch size of 64 can reach up to the best preference accuracy."

Is this for the 8k budget? Can we get the suitable hyperparams for the other budgets?

Secondly, which part of the code freezes the transformer layers? Thanks!

Sep 01 '23 02:09 bhattg

Thanks for your discussion! Firstly, the hyperparameters are for 8k budget (but the shuffle may differ, so it's worth trying a little bit different ones). Secondly, see https://github.com/THUDM/ImageReward/blob/main/train/src/ImageReward.py#L87-L99.

Sep 01 '23 16:09 xujz18

Thank you very much!

Sep 03 '23 22:09 bhattg

Hey will it be possible to provide the hyperparams for 1k and 4k settings as well? That will be very useful.

Sep 04 '23 20:09 bhattg

The 8k hyper-parameters should only need smaller adjustments to accommodate 1k/2k/4k.

Sep 06 '23 08:09 xujz18

Thanks! In your experience which hyperparameters were the most sensitive? I will try to tune them.

Sep 11 '23 01:09 bhattg

ImageReward ImageReward copied to clipboard

Training ImageReward model on different budgets

ImageReward
ImageReward copied to clipboard