ImageReward ReFL implement details

As mentioned in #24 and #34, the current ReFL code only the ReFL loss is implemented and the pre-training loss is not included. In addition, the two losses are optimized alternately.

I want to add pre-training data myself. If we don't use the gradient accumulation, the pseudo code would be like this:

# Given optimizer and lr_scheduler with unet.
# Compute Pre-training Loss `train_loss` with unet and update unet.
train_loss.backward()
optimizer.step()
lr_scheduler.step()  # is it necessary?
optimizer.zero_grad()

# Compute ReFL Loss `refl_loss` with unet and update unet.
refl_loss.backward()
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()

However, I'm confused about how to add accelerator.accumulate(unet) for gradient accumulation after reading this post. And I also raised the issue huggingface/accelerate#1870 and discussion in the huggingface accelerate github repo and forum. But I don't seem to get a clear answer. Can you give me some pseudo codes or hints? Thank you very much! @xujz18 @tongyx361

Sep 11 '23 15:09 hkunzhe

May be duplicated with https://github.com/THUDM/ImageReward/issues/34#issuecomment-1687365733. I was afraid it wouldn't be seen in a closed issue, so I raised this new issue.

Sep 11 '23 15:09 hkunzhe

Your understanding is correct and I appreciate the discussion.

Sep 17 '23 15:09 xujz18