DRL-code-pytorch
DRL-code-pytorch copied to clipboard
Concise pytorch implements of DRL algorithms, including REINFORCE, A2C, DQN, PPO(discrete and continuous), DDPG, TD3, SAC.
Nice experiments on PPO tricks. I've been trying to use PPO on PyBullet Envs but I find many tricks used in this repo are actually detrimental. (I have created my...
 Just like the picture shows, I find the curve fluctuates -120. Actually, I did not change anything, so I am confused about the result.
ppo训练问题
你好,感谢你提供得代码,对我来说有很大帮助,但是我在用ppo得时候出现了点问题,我是一个初学者,我在训练得时候发现连续得ppo算法接入到我自定义得环境后他得每个episode得奖励都一模一样,网络给出得动作是不同但相差非常小,不知道为什么哪里出了问题
请问在离散-连续的混合动作空间(动作变量一个离散一个连续),该怎么使用PPO算法的代码?
在ppo-discrete-RNN代码里,不是应该要在buffer里面存储RNN的隐层状态吗,然后在更新的时候取出来恢复RNN的状态,我看代码里是每取一个mini-batchsize就reset一下隐层,这是否正确呢