ElegantRL
ElegantRL copied to clipboard
maybe a small bug in the function `explore_vec_env` of discretePPO and discreteA2C?
trafficstars
in the function explore_vec_env of AgentPPO, the variable actions shaped with [horizon_len, self.num_envs, 1], but the following expression convert(action) return the tensor with the 1-dim shape num_envs, which actually should be [num_envs, 1] as it works in explore_vec_env of AgentD3QN. And it indeed faild the demoexamples/demo_A2C_PPO.py.
Folloiwing change works for me:
# ActorDiscretePPO of net.py
def get_action(self, state: Tensor) -> (Tensor, Tensor):
state = self.state_norm(state)
a_prob = self.soft_max(self.net(state))
a_dist = self.ActionDist(a_prob)
action = a_dist.sample()
logprob = a_dist.log_prob(action)
return action.unsqueeze(1), logprob # unsqueeze the action