BipedalWalkerHardcore-SAC
BipedalWalkerHardcore-SAC copied to clipboard
why not update alpha?
#alpha_loss = -(self.log_alpha * (log_pi + self.target_entropy).detach()).mean() #self.alpha_optim.zero_grad() #alpha_loss.backward() #self.alpha_optim.step() why not update?
it's a experiment to test whether it is better to use fixed alpha, because I found a result in the SAC paper showing that it might be better to use fixed alpha if alpha is good.