ElegantRL
ElegantRL copied to clipboard
A qusetion about the code of 'ActorSAC' class in net.py
I'm confused why we use 'logprob = dist.log_prob(a_avg)' instead of 'logprob = dist.log_prob(action)' in line 247 of elegantrl/agents/net.py. I think the latter is consistent to the original paper. Is using the former better in experiment?
def get_action_logprob(self, state):
state = self.state_norm(state)
s_enc = self.net_s(state) # encoded state
a_avg, a_std_log = self.net_a(s_enc).chunk(2, dim=1)
a_std = a_std_log.clamp(-16, 2).exp()
dist = Normal(a_avg, a_std)
action = dist.rsample()
action_tanh = action.tanh()
logprob = dist.log_prob(a_avg)
logprob -= (-action_tanh.pow(2) + 1.000001).log() # fix logprob using the derivative of action.tanh()
return action_tanh, logprob.sum(1)
It is better.
You can read the webpage below for more information.
Update tanh bijector with numerically stable formula. https://github.com/tensorflow/probability/commit/ef6bb176e0ebd1cf6e25c6b5cecdd2428c22963f#diff-e120f70e92e6741bca649f04fcd907b7
def log_abs_det_jacobian(self, x, y):
# We use a formula that is more numerically stable, see details in the following link
# https://github.com/tensorflow/probability/commit/ef6bb176e0ebd1cf6e25c6b5cecdd2428c22963f#diff-e120f70e92e6741bca649f04fcd907b7
return 2. * (math.log(2.) - x - F.softplus(-2. * x))
https://github.com/AI4Finance-Foundation/ElegantRL/blob/dee9c6d095001bf8365c0359f0d04a021d8c1e22/elegantrl/agents/net.py
ElegantRL code comment https://github.com/AI4Finance-Foundation/ElegantRL/blob/dee9c6d095001bf8365c0359f0d04a021d8c1e22/elegantrl/agents/net.py#L344-L359