trlx
trlx copied to clipboard
Confused by config parameters among total_steps, epochs, batch_size and num_rollouts
📚 The doc issue
TrainConfig has some general explanation about some of the parameters. But after running the ppo_hh.py, i got confused.
- ppo_hh.py set total_steps and epochs while it seems epochs was ignored in the training.
- How many data will a step consumed? num_rollout or batch_size?
- In my understanding, every epoch uses num_rollout examples. They are used for generation in a chunk_size batch. Then every step use batch_size to update the policy model. Am i right?
Thank you in advance for answering all the questions.
Suggest a potential alternative/fix
Maybe there should be some more detailed description about these parameters?