Deep-reinforcement-learning-with-pytorch
Deep-reinforcement-learning-with-pytorch copied to clipboard
Big bug in PPO2
In dist = Normal(mu, sigma)
, sigma
should be a positive value, but actor_net output can be negative, so action_log_prob = dist.log_prob(action)
can be nan
.
Try:
import torch
a = torch.FloatTensor([1]).cuda()
b = torch.FloatTensor([-1]).cuda()
dist = Normal(a,b)
action = dist.sample()
action_log_prob = dist.log_prob(action)
print(action.cpu().numpy())
print(action_log_prob.item())
so how can I fix the bug ?
return sigma*sigma
You can add an activation function before the output of actor network. Using relu or softplus function may change sigma into a positive value. Hope it helps.