Munchausen-RL
Munchausen-RL copied to clipboard
Wrong value in call to F.softmax
Should F.softmax(Q_targets_next, dim=1) be F.softmax(Q_targets_next / entropy_tau, dim=1) instead?
for DQN its only Q_targets_next:

but for IQN you are right :)
Oh, I didn't notice that, seems to contradict equation 2, and it would also change the logsumexp calculations, given that these assume the q values are divided by entropy_tau
Confirmed with the author that it is a typo, the values should be divided by entropy_tau.
Also there is a TF implementation here: https://github.com/google-research/google-research/tree/master/munchausen_rl
@marioyc Thank you! I'll fix it :)