trl
trl copied to clipboard
Increasing PPOconfig training steps does not increase the number of training iterations
Apologies in advance if I'm missing something obvious, but when I increase PPOconfig.steps
from the default of 20000 PPO doesn't seem to train for more iterations using accelerate with 4 GPUs and the example script ./examples/sentiment/scripts/t5-sentiment.py
.
I did a quick grep through the trl code and saw the only reference to steps here:
lib/trl/trainer/ppo_config.py: self.total_ppo_epochs = int(np.ceil(steps / batch_size))
However, total_ppo_epochs
doesn't seem to be referred to anywhere else.
What PPOconfig should I set to train for more than about 23 iterations?
Indeed, the total_ppo_epochs in the config are deprecated. If you want to train for multiple epochs it's best to add an additional for-loop around the dataloader:
for epoch in range(num_epochs):
for step, batch in tqdm(enumerate(ppo_trainer.dataloader)):
query_tensors = batch["input_ids"]
...
Agreed that this is not very clear in the examples - will update them and remove the total_ppo_epochs
from the config.
Thanks for the reply. Should I leave this issue open to track your updates?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.