Carla-ppo subpolicy question,

subpolicy question,

Open EthanCodesss opened this issue 4 years ago • 0 comments

Hi！ First,In ppo.py self.policy = self.loss = -self.policy_loss + self.value_loss - self.entropy_loss you said ' Reduce sum over all sub-policies (where only the active sub-policy will be non-zero due to previous filtering',but the loss will be a list. How can a list of loss background? self.train_step = self.optimizer.minimize(self.loss, var_list=policy_params).

Second,you first compute '_create_sub_policy' ,in this part the loss will be reduce mean and finally became a scalar. After filtering，all sub policy module will output the same value. It really work?

Feb 01 '21 06:02 EthanCodesss

Carla-ppo Carla-ppo copied to clipboard

subpolicy question,

Carla-ppo
Carla-ppo copied to clipboard