Carla-ppo
Carla-ppo copied to clipboard
subpolicy question,
Hi! First,In ppo.py
self.policy = self.loss = -self.policy_loss + self.value_loss - self.entropy_loss
you said ' Reduce sum over all sub-policies (where only the active sub-policy will be non-zero due to previous filtering',but the loss will be a list. How can a list of loss background?
self.train_step = self.optimizer.minimize(self.loss, var_list=policy_params)
.
Second,you first compute '_create_sub_policy' ,in this part the loss will be reduce mean and finally became a scalar. After filtering,all sub policy module will output the same value. It really work?