ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

The results of PPO training are not so good

Open guijuzhejiang opened this issue 2 years ago • 1 comments

PPO training is difficult to converge. It feels related to the hyperparameters num_episodes, max_epochs, max_timesteps, update_timesteps. How do you recommend setting these parameters?

guijuzhejiang avatar Apr 13 '23 01:04 guijuzhejiang

Hi @guijuzhejiang There are many possible factors and parameters, and it's hard to give a best/good recommendation about them. We are also trying it with different models/datasets/etc. Welcome to share your results and insights. Thanks.

binmakeswell avatar Apr 17 '23 10:04 binmakeswell