trlx icon indicating copy to clipboard operation
trlx copied to clipboard

Confused by config parameters among total_steps, epochs, batch_size and num_rollouts

Open drxmy opened this issue 1 year ago • 0 comments

📚 The doc issue

TrainConfig has some general explanation about some of the parameters. But after running the ppo_hh.py, i got confused.

  1. ppo_hh.py set total_steps and epochs while it seems epochs was ignored in the training.
  2. How many data will a step consumed? num_rollout or batch_size?
  3. In my understanding, every epoch uses num_rollout examples. They are used for generation in a chunk_size batch. Then every step use batch_size to update the policy model. Am i right?

Thank you in advance for answering all the questions.

Suggest a potential alternative/fix

Maybe there should be some more detailed description about these parameters?

drxmy avatar Apr 06 '23 03:04 drxmy