pfrl
pfrl copied to clipboard
Discrepancy in SAC on entropy coefficient update
Noticed that here the log_prob variable is computed before the udpate of the actor while on SAC's repo it is recomputed after the actor update (the paper also mentions in Section 6 that an update is made on both q-function and policy before the update for the entropy coefficient). By any chance have you compared whether this detail makes a difference?
You are right, it seems to be a discrepancy from the official implementation. I do not remember whether I made a comparison, maybe not.
I see, no problem, thanks for replying anyways.