ElegantRL icon indicating copy to clipboard operation
ElegantRL copied to clipboard

SAC alpha update problem

Open Shapeno opened this issue 1 year ago • 1 comments
trafficstars

In obj_alpha = (self.alpha_log * (self.target_entropy - log_prob).detach()).mean() when alpha_log=0, alpha will be 1forever. the correct way is obj_alpha = (self.alpha * (self.target_entropy - log_prob).detach()).mean() .

this problem is also found in rlkit.

Algorithm details in the source code of <Soft Actor-Critic Algorithms and Applications>: https://github.com/rail-berkeley/softlearning/blob/13cf187cc93d90f7c217ea2845067491c3c65464/softlearning/algorithms/sac.py#L256

Shapeno avatar Feb 29 '24 09:02 Shapeno

https://github.com/AI4Finance-Foundation/ElegantRL/blob/b4b9d662b9f9cb7cc368ac2b1036b5119eb20be4/elegantrl/agents/AgentSAC.py#L48C13-L48C23

Shapeno avatar Feb 29 '24 09:02 Shapeno