pytorch-ddpg issues

Why append additional (s a r) pair to the replay buffer after one episode is done?

3

Hi Guan-Horng, Thanks for your great implementation! I am wondering why do we append additional (s a r) pair to the replay buffer after one episode is done? The reward...

Hanrui-Wang

NotImplementedError

2

Traceback (most recent call last): File "D:\Master\Codes\pytorch-ddpg\main.py", line 156, in train(args.train_iter, agent, env, evaluate, File "D:\Master\Codes\pytorch-ddpg\main.py", line 44, in train observation2, reward, done, info = env.step(action) File "D:\AI\Software\Conda\Miniconda\envs\torch\lib\site-packages\gym\core.py", line 349,...

ashing-zhang

The effect of NormalizedEnv

2

Hi, thank you for this great implementation!! However, I'm not very sure about the effect of normalized_env.py. Actually, if I remove it, the results seem to be worse than not...

pengzhi1998

the gradient of the action-value with respect to actions

1

Hi, I'm not sure if it would calculate the gradient of the action-value with respect to actions? policy_loss = -self.critic([ to_tensor(state_batch), self.actor(to_tensor(state_batch)) ])

Joywanglulu

pytorch-ddpg
pytorch-ddpg copied to clipboard

Metadata

Why append additional (s a r) pair to the replay buffer after one episode is done?

NotImplementedError

The effect of NormalizedEnv

the gradient of the action-value with respect to actions

← Metadata

Owner

Metadata

pytorch-ddpg pytorch-ddpg copied to clipboard

Metadata

Why append additional (s a r) pair to the replay buffer after one episode is done?

NotImplementedError

The effect of NormalizedEnv

the gradient of the action-value with respect to actions

← Metadata

Owner

Metadata

pytorch-ddpg
pytorch-ddpg copied to clipboard