pfrl Discrepancy in SAC on entropy coefficient update

Discrepancy in SAC on entropy coefficient update

Open marioyc opened this issue 3 years ago • 2 comments

Noticed that here the log_prob variable is computed before the udpate of the actor while on SAC's repo it is recomputed after the actor update (the paper also mentions in Section 6 that an update is made on both q-function and policy before the update for the entropy coefficient). By any chance have you compared whether this detail makes a difference?

Oct 25 '22 01:10 marioyc

You are right, it seems to be a discrepancy from the official implementation. I do not remember whether I made a comparison, maybe not.

Oct 25 '22 03:10 muupan

I see, no problem, thanks for replying anyways.

Oct 26 '22 02:10 marioyc

pfrl pfrl copied to clipboard

Discrepancy in SAC on entropy coefficient update

pfrl
pfrl copied to clipboard