Deep-Reinforcement-Learning-Algorithms-with-PyTorch actor loss for MultiDiscrete action space

actor loss for MultiDiscrete action space

Open olivia2222 opened this issue 1 year ago • 0 comments

trafficstars

Hi，

I want to use SAC algorithm in MultiDiscrete action space. In Discrete action space, the actor loss is calculated as follows: https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/4835bac8557fdacff1735eca004e35ea5a4b7443/agents/actor_critic_agents/SAC_Discrete.py#L83-L88

In MultiDiscrete action space, the shapes of log_action_probabilities action_probabilities qf1_pi and qf2_pi are all [batch_size, num_action_dim, num_actions_per_dim]. Can you give me some hints on how to calculate policy_loss in MultiDiscrete action space? Should I apply sum(-1) twice to make sure the shape of policy_loss is [batch_size]?

Sep 28 '24 15:09 olivia2222

Deep-Reinforcement-Learning-Algorithms-with-PyTorch Deep-Reinforcement-Learning-Algorithms-with-PyTorch copied to clipboard

actor loss for MultiDiscrete action space

Deep-Reinforcement-Learning-Algorithms-with-PyTorch
Deep-Reinforcement-Learning-Algorithms-with-PyTorch copied to clipboard