Mario Ynocente Castro
Mario Ynocente Castro
Hello, the initial set of baselines (Rainbow, PPO, DDDQN) is available here: https://github.com/minerllabs/baselines/tree/master/general/chainerrl/baselines We'll be releasing another set that uses the data soon.
I see, no problem, thanks for replying anyways.
Oh, I didn't notice that, seems to contradict equation 2, and it would also change the logsumexp calculations, given that these assume the q values are divided by `entropy_tau `
Confirmed with the author that it is a typo, the values should be divided by `entropy_tau`. Also there is a TF implementation here: https://github.com/google-research/google-research/tree/master/munchausen_rl