Deep-reinforcement-learning-with-pytorch Big bug in PPO2

Big bug in PPO2

Open Vinson-sheep opened this issue 3 years ago • 3 comments

In dist = Normal(mu, sigma) , sigma should be a positive value, but actor_net output can be negative, so action_log_prob = dist.log_prob(action) can be nan.

Try:

import torch
a = torch.FloatTensor([1]).cuda()
b = torch.FloatTensor([-1]).cuda()
dist = Normal(a,b)
action = dist.sample()
action_log_prob = dist.log_prob(action)

print(action.cpu().numpy())
print(action_log_prob.item())

Feb 16 '22 09:02 Vinson-sheep

so how can I fix the bug ？

Apr 15 '22 12:04 jzl20

return sigma*sigma

Sep 15 '22 08:09 flyinglife001

You can add an activation function before the output of actor network. Using relu or softplus function may change sigma into a positive value. Hope it helps.

Apr 07 '23 06:04 WhiteNightSleepless

Deep-reinforcement-learning-with-pytorch Deep-reinforcement-learning-with-pytorch copied to clipboard

Big bug in PPO2

Deep-reinforcement-learning-with-pytorch
Deep-reinforcement-learning-with-pytorch copied to clipboard