trl Increasing PPOconfig training steps does not increase the number of training iterations

Increasing PPOconfig training steps does not increase the number of training iterations

Open gjmulder opened this issue 2 years ago • 2 comments

Apologies in advance if I'm missing something obvious, but when I increase PPOconfig.steps from the default of 20000 PPO doesn't seem to train for more iterations using accelerate with 4 GPUs and the example script ./examples/sentiment/scripts/t5-sentiment.py.

I did a quick grep through the trl code and saw the only reference to steps here:

lib/trl/trainer/ppo_config.py: self.total_ppo_epochs = int(np.ceil(steps / batch_size))

However, total_ppo_epochs doesn't seem to be referred to anywhere else.

What PPOconfig should I set to train for more than about 23 iterations?

Feb 20 '23 18:02 gjmulder

Indeed, the total_ppo_epochs in the config are deprecated. If you want to train for multiple epochs it's best to add an additional for-loop around the dataloader:

for epoch in range(num_epochs):
    for step, batch in tqdm(enumerate(ppo_trainer.dataloader)):
        query_tensors = batch["input_ids"]
        ...

Agreed that this is not very clear in the examples - will update them and remove the total_ppo_epochs from the config.

Feb 23 '23 18:02 lvwerra

Thanks for the reply. Should I leave this issue open to track your updates?

Feb 28 '23 17:02 gjmulder

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Jun 20 '23 15:06 github-actions[bot]

trl trl copied to clipboard

Increasing PPOconfig training steps does not increase the number of training iterations

trl
trl copied to clipboard