softlearning
softlearning copied to clipboard
using deterministic policy in enviroment like lunarlander?
Hi and thank you for such a genius algorithm. I wonder how by using mu of gaussian policy in sac in enviroments like lunar lander is it guranteed to converge cuz i see some trials fails to converges. specialy on lunar lander and humnoid v3