DRL-code-pytorch icon indicating copy to clipboard operation
DRL-code-pytorch copied to clipboard

Concise pytorch implements of DRL algorithms, including REINFORCE, A2C, DQN, PPO(discrete and continuous), DDPG, TD3, SAC.

Results 15 DRL-code-pytorch issues
Sort by recently updated
recently updated
newest added

Nice experiments on PPO tricks. I've been trying to use PPO on PyBullet Envs but I find many tricks used in this repo are actually detrimental. (I have created my...

![Screenshot from 2022-10-21 12-05-48](https://user-images.githubusercontent.com/87897172/197171025-a9de6183-3f85-4c9a-84cc-f65866ba28b9.png) Just like the picture shows, I find the curve fluctuates -120. Actually, I did not change anything, so I am confused about the result.

你好,感谢你提供得代码,对我来说有很大帮助,但是我在用ppo得时候出现了点问题,我是一个初学者,我在训练得时候发现连续得ppo算法接入到我自定义得环境后他得每个episode得奖励都一模一样,网络给出得动作是不同但相差非常小,不知道为什么哪里出了问题

请问在离散-连续的混合动作空间(动作变量一个离散一个连续),该怎么使用PPO算法的代码?

在ppo-discrete-RNN代码里,不是应该要在buffer里面存储RNN的隐层状态吗,然后在更新的时候取出来恢复RNN的状态,我看代码里是每取一个mini-batchsize就reset一下隐层,这是否正确呢