modular_rl
modular_rl copied to clipboard
Will dropout break out the final loss of ppo algorithm?
If I add dropout layer to model, will it be a bad idea?
Any experiments there?
I use eval model when explore environment, and use train model for policy, old policy and value model when training