trl RewardTrainer multiple forward passes

RewardTrainer multiple forward passes

Open IvanSedykh opened this issue 1 year ago • 5 comments

During RewardModel training we do 2 forward passes for each batch -- one for the chosen samples and the other for rejected ones. The link to the related code is below:

https://github.com/huggingface/trl/blob/a46cd84a6405312837f0d0e56fd1cf4d45585770/trl/trainer/reward_trainer.py#L228-L242

Wouldn't it be better (probably more efficient in some cases) if we concatenate these samples and perform a single forward pass with 2х actual batch size like in DPOTrainer? May be add this as an option somehow.

What do you think? Is it reasonable?

Feb 21 '24 11:02 IvanSedykh

It's totally reasonable and it makes a lot of sense, would you like to give it a try 🙏 ? I'll be happy to assist / guide you, it seems you just need to change that piece of code right?

Feb 27 '24 01:02 younesbelkada

Sure, I hope to prepare PR later this week.

Feb 27 '24 08:02 IvanSedykh

awesome @IvanSedykh , looking forward to it !

Feb 27 '24 08:02 younesbelkada

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Mar 22 '24 15:03 github-actions[bot]

bump The issue still persists, I will work on this later

Mar 22 '24 15:03 IvanSedykh

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Apr 16 '24 15:04 github-actions[bot]

trl trl copied to clipboard

RewardTrainer multiple forward passes

trl
trl copied to clipboard