trl
trl copied to clipboard
RewardTrainer multiple forward passes
During RewardModel training we do 2 forward passes for each batch -- one for the chosen samples and the other for rejected ones. The link to the related code is below:
https://github.com/huggingface/trl/blob/a46cd84a6405312837f0d0e56fd1cf4d45585770/trl/trainer/reward_trainer.py#L228-L242
Wouldn't it be better (probably more efficient in some cases) if we concatenate these samples and perform a single forward pass with 2х actual batch size like in DPOTrainer
?
May be add this as an option somehow.
What do you think? Is it reasonable?
It's totally reasonable and it makes a lot of sense, would you like to give it a try 🙏 ? I'll be happy to assist / guide you, it seems you just need to change that piece of code right?
Sure, I hope to prepare PR later this week.
awesome @IvanSedykh , looking forward to it !
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
bump The issue still persists, I will work on this later
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.