trl icon indicating copy to clipboard operation
trl copied to clipboard

Increasing PPOconfig training steps does not increase the number of training iterations

Open gjmulder opened this issue 2 years ago • 2 comments

Apologies in advance if I'm missing something obvious, but when I increase PPOconfig.steps from the default of 20000 PPO doesn't seem to train for more iterations using accelerate with 4 GPUs and the example script ./examples/sentiment/scripts/t5-sentiment.py.

I did a quick grep through the trl code and saw the only reference to steps here:

lib/trl/trainer/ppo_config.py: self.total_ppo_epochs = int(np.ceil(steps / batch_size))

However, total_ppo_epochs doesn't seem to be referred to anywhere else.

What PPOconfig should I set to train for more than about 23 iterations?

gjmulder avatar Feb 20 '23 18:02 gjmulder

Indeed, the total_ppo_epochs in the config are deprecated. If you want to train for multiple epochs it's best to add an additional for-loop around the dataloader:

for epoch in range(num_epochs):
    for step, batch in tqdm(enumerate(ppo_trainer.dataloader)):
        query_tensors = batch["input_ids"]
        ...

Agreed that this is not very clear in the examples - will update them and remove the total_ppo_epochs from the config.

lvwerra avatar Feb 23 '23 18:02 lvwerra

Thanks for the reply. Should I leave this issue open to track your updates?

gjmulder avatar Feb 28 '23 17:02 gjmulder

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Jun 20 '23 15:06 github-actions[bot]