pytorch-ddpg icon indicating copy to clipboard operation
pytorch-ddpg copied to clipboard

Implementation of the Deep Deterministic Policy Gradient (DDPG) using PyTorch

Results 4 pytorch-ddpg issues
Sort by recently updated
recently updated
newest added

Hi Guan-Horng, Thanks for your great implementation! I am wondering why do we append additional (s a r) pair to the replay buffer after one episode is done? The reward...

Traceback (most recent call last): File "D:\Master\Codes\pytorch-ddpg\main.py", line 156, in train(args.train_iter, agent, env, evaluate, File "D:\Master\Codes\pytorch-ddpg\main.py", line 44, in train observation2, reward, done, info = env.step(action) File "D:\AI\Software\Conda\Miniconda\envs\torch\lib\site-packages\gym\core.py", line 349,...

Hi, thank you for this great implementation!! However, I'm not very sure about the effect of normalized_env.py. Actually, if I remove it, the results seem to be worse than not...

Hi, I'm not sure if it would calculate the gradient of the action-value with respect to actions? policy_loss = -self.critic([ to_tensor(state_batch), self.actor(to_tensor(state_batch)) ])