Deep-reinforcement-learning-with-pytorch icon indicating copy to clipboard operation
Deep-reinforcement-learning-with-pytorch copied to clipboard

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

Results 32 Deep-reinforcement-learning-with-pytorch issues
Sort by recently updated
recently updated
newest added
trafficstars

Add a title "Deep-reinforcement-learning-with-pytorch" to readme.md

in sac.py s = torch.tensor([t.s for t in self.replay_buffer]).float().to(device) Traceback (most recent call last): File "D:\PycharmProject\Deep-reinforcement-learning-with-pytorch-master\Char09 SAC\SAC.py", line 307, in main() File "D:\PycharmProject\Deep-reinforcement-learning-with-pytorch-master\Char09 SAC\SAC.py", line 293, in main agent.update() File...

In SAC.py, SAC_BipedalWalker-v2.py, the codes: ```python class NormalizedActions(gym.ActionWrapper): def _action(self, action): low = self.action_space.low high = self.action_space.high action = low + (action + 1.0) * 0.5 * (high - low)...

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.15.2 to 2.7.2. Release notes Sourced from tensorflow's releases. TensorFlow 2.7.2 Release 2.7.2 This releases introduces several vulnerability fixes: Fixes a code injection in saved_model_cli (CVE-2022-29216) Fixes...

dependencies

In `dist = Normal(mu, sigma)` , `sigma` should be a positive value, but actor_net output can be negative, so `action_log_prob = dist.log_prob(action)` can be `nan`. Try: ``` import torch a...

the update value network should be: alpha_w = 1e-3 # 初始化 optimizer_w = optim.Adam(**s_value_func**.parameters(), lr=alpha_w) optimizer_w.zero_grad() policy_loss_w =-delta policy_loss_w.backward(retain_graph = True) clip_grad_norm_(policy_loss_w, 0.1) optimizer_w.step()

log_prob should be multiplied by temperature factor (alpha) when calculating pi_loss in ALL implementations of SAC.

In Line 224: `args.max_length_of_trajectory` is missing. I know this refers to the maximum step length, but how big is this param usually?

Dear The result is still 40 after many steps for hopper? Is there any hp tunings? ![image](https://user-images.githubusercontent.com/11004576/144832360-7e9c4d8a-1dc2-40f9-8953-a8d2991c84de.png)

I don't konw if it's because of my device or the program, but this pendulum-v0 just doesn't work so well in my device. You see, the pendulum only moves one...