Deep-reinforcement-learning-with-pytorch
Deep-reinforcement-learning-with-pytorch copied to clipboard
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
Add a title "Deep-reinforcement-learning-with-pytorch" to readme.md
SAC_Bug
in sac.py s = torch.tensor([t.s for t in self.replay_buffer]).float().to(device) Traceback (most recent call last): File "D:\PycharmProject\Deep-reinforcement-learning-with-pytorch-master\Char09 SAC\SAC.py", line 307, in main() File "D:\PycharmProject\Deep-reinforcement-learning-with-pytorch-master\Char09 SAC\SAC.py", line 293, in main agent.update() File...
SAC Bugs
In SAC.py, SAC_BipedalWalker-v2.py, the codes: ```python class NormalizedActions(gym.ActionWrapper): def _action(self, action): low = self.action_space.low high = self.action_space.high action = low + (action + 1.0) * 0.5 * (high - low)...
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.15.2 to 2.7.2. Release notes Sourced from tensorflow's releases. TensorFlow 2.7.2 Release 2.7.2 This releases introduces several vulnerability fixes: Fixes a code injection in saved_model_cli (CVE-2022-29216) Fixes...
In `dist = Normal(mu, sigma)` , `sigma` should be a positive value, but actor_net output can be negative, so `action_log_prob = dist.log_prob(action)` can be `nan`. Try: ``` import torch a...
the update value network should be: alpha_w = 1e-3 # 初始化 optimizer_w = optim.Adam(**s_value_func**.parameters(), lr=alpha_w) optimizer_w.zero_grad() policy_loss_w =-delta policy_loss_w.backward(retain_graph = True) clip_grad_norm_(policy_loss_w, 0.1) optimizer_w.step()
log_prob should be multiplied by temperature factor (alpha) when calculating pi_loss in ALL implementations of SAC.
In Line 224: `args.max_length_of_trajectory` is missing. I know this refers to the maximum step length, but how big is this param usually?
Dear The result is still 40 after many steps for hopper? Is there any hp tunings? 
I don't konw if it's because of my device or the program, but this pendulum-v0 just doesn't work so well in my device. You see, the pendulum only moves one...