trl icon indicating copy to clipboard operation
trl copied to clipboard

RewardTrainer multiple forward passes

Open IvanSedykh opened this issue 1 year ago • 5 comments

During RewardModel training we do 2 forward passes for each batch -- one for the chosen samples and the other for rejected ones. The link to the related code is below:

https://github.com/huggingface/trl/blob/a46cd84a6405312837f0d0e56fd1cf4d45585770/trl/trainer/reward_trainer.py#L228-L242

Wouldn't it be better (probably more efficient in some cases) if we concatenate these samples and perform a single forward pass with 2х actual batch size like in DPOTrainer? May be add this as an option somehow.

What do you think? Is it reasonable?

IvanSedykh avatar Feb 21 '24 11:02 IvanSedykh

It's totally reasonable and it makes a lot of sense, would you like to give it a try 🙏 ? I'll be happy to assist / guide you, it seems you just need to change that piece of code right?

younesbelkada avatar Feb 27 '24 01:02 younesbelkada

Sure, I hope to prepare PR later this week.

IvanSedykh avatar Feb 27 '24 08:02 IvanSedykh

awesome @IvanSedykh , looking forward to it !

younesbelkada avatar Feb 27 '24 08:02 younesbelkada

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Mar 22 '24 15:03 github-actions[bot]

bump The issue still persists, I will work on this later

IvanSedykh avatar Mar 22 '24 15:03 IvanSedykh

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Apr 16 '24 15:04 github-actions[bot]