ColossalAI
ColossalAI copied to clipboard
The results of PPO training are not so good
PPO training is difficult to converge. It feels related to the hyperparameters num_episodes, max_epochs, max_timesteps, update_timesteps. How do you recommend setting these parameters?
Hi @guijuzhejiang There are many possible factors and parameters, and it's hard to give a best/good recommendation about them. We are also trying it with different models/datasets/etc. Welcome to share your results and insights. Thanks.