Deep-reinforcement-learning-with-pytorch issues

Update readme.md

Add a title "Deep-reinforcement-learning-with-pytorch" to readme.md

in sac.py s = torch.tensor([t.s for t in self.replay_buffer]).float().to(device) Traceback (most recent call last): File "D:\PycharmProject\Deep-reinforcement-learning-with-pytorch-master\Char09 SAC\SAC.py", line 307, in main() File "D:\PycharmProject\Deep-reinforcement-learning-with-pytorch-master\Char09 SAC\SAC.py", line 293, in main agent.update() File...

aut6620

SAC Bugs

4

In SAC.py, SAC_BipedalWalker-v2.py, the codes: ```python class NormalizedActions(gym.ActionWrapper): def _action(self, action): low = self.action_space.low high = self.action_space.high action = low + (action + 1.0) * 0.5 * (high - low)...

ZiyiLiubird

Bump tensorflow from 1.15.2 to 2.7.2

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.15.2 to 2.7.2. Release notes Sourced from tensorflow's releases. TensorFlow 2.7.2 Release 2.7.2 This releases introduces several vulnerability fixes: Fixes a code injection in saved_model_cli (CVE-2022-29216) Fixes...

dependabot[bot]

dependencies

Big bug in PPO2

3

In `dist = Normal(mu, sigma)` , `sigma` should be a positive value, but actor_net output can be negative, so `action_log_prob = dist.log_prob(action)` can be `nan`. Try: ``` import torch a...

Vinson-sheep

bug in reinforce with baseline

3

the update value network should be: alpha_w = 1e-3 # 初始化 optimizer_w = optim.Adam(**s_value_func**.parameters(), lr=alpha_w) optimizer_w.zero_grad() policy_loss_w =-delta policy_loss_w.backward(retain_graph = True) clip_grad_norm_(policy_loss_w, 0.1) optimizer_w.step()

hlhang9527

Temperature factor missing in SAC !!!

1

log_prob should be multiplied by temperature factor (alpha) when calculating pi_loss in ALL implementations of SAC.

Darkness-hy

One parameter is missing in DDPG Code

1

In Line 224: `args.max_length_of_trajectory` is missing. I know this refers to the maximum step length, but how big is this param usually?

catchy666

what is your training performance on mujoco?

Dear The result is still 40 after many steps for hopper? Is there any hp tunings? ![image](https://user-images.githubusercontent.com/11004576/144832360-7e9c4d8a-1dc2-40f9-8953-a8d2991c84de.png)

xiaoyuanzh

A problem in Chapter 5: DDPG

2

I don't konw if it's because of my device or the program, but this pendulum-v0 just doesn't work so well in my device. You see, the pendulum only moves one...

MoonieC

Deep-reinforcement-learning-with-pytorch
Deep-reinforcement-learning-with-pytorch copied to clipboard

Metadata

Update readme.md

SAC_Bug

SAC Bugs

Bump tensorflow from 1.15.2 to 2.7.2

Big bug in PPO2

bug in reinforce with baseline

Temperature factor missing in SAC !!!

One parameter is missing in DDPG Code

what is your training performance on mujoco?

A problem in Chapter 5: DDPG

← Metadata

Owner

Metadata

Deep-reinforcement-learning-with-pytorch Deep-reinforcement-learning-with-pytorch copied to clipboard

Metadata

← Metadata

Owner

Metadata

Deep-reinforcement-learning-with-pytorch
Deep-reinforcement-learning-with-pytorch copied to clipboard