Munchausen-RL Wrong value in call to F.softmax

Wrong value in call to F.softmax

Open marioyc opened this issue 5 years ago • 4 comments

Should F.softmax(Q_targets_next, dim=1) be F.softmax(Q_targets_next / entropy_tau, dim=1) instead?

Sep 14 '20 18:09 marioyc

for DQN its only Q_targets_next:

but for IQN you are right :)

Sep 14 '20 18:09 BY571

Oh, I didn't notice that, seems to contradict equation 2, and it would also change the logsumexp calculations, given that these assume the q values are divided by entropy_tau

Sep 15 '20 00:09 marioyc

Confirmed with the author that it is a typo, the values should be divided by entropy_tau. Also there is a TF implementation here: https://github.com/google-research/google-research/tree/master/munchausen_rl

Sep 18 '20 21:09 marioyc

@marioyc Thank you! I'll fix it :)

Sep 18 '20 21:09 BY571

Munchausen-RL Munchausen-RL copied to clipboard

Wrong value in call to F.softmax

Munchausen-RL
Munchausen-RL copied to clipboard