trlx
trlx copied to clipboard

Published 20 hours ago •

Reame
Issues

Confused by config parameters among total_steps, epochs, batch_size and num_rollouts

Open drxmy opened this issue 1 year ago • 0 comments

📚 The doc issue

TrainConfig has some general explanation about some of the parameters. But after running the ppo_hh.py, i got confused.

ppo_hh.py set total_steps and epochs while it seems epochs was ignored in the training.
How many data will a step consumed? num_rollout or batch_size?
In my understanding, every epoch uses num_rollout examples. They are used for generation in a chunk_size batch. Then every step use batch_size to update the policy model. Am i right?

Thank you in advance for answering all the questions.

Suggest a potential alternative/fix

Maybe there should be some more detailed description about these parameters?

Apr 06 '23 03:04 drxmy