dads
dads copied to clipboard
DADS reward implementation
Thank you for sharing your great code :)
I think I found that the reward function is a little different from what was defined in the paper(iclr2020): https://github.com/google-research/dads/blob/abc37f532c26658e41ae309b646e8963bd7a8676/unsupervised_skill_learning/dads_agent.py#L142-L144
As far as I understand, the first reward term defined in eq. 6 of the paper is log q(s'|s,z) - log(\sum_{i=1}^{L}{q(s'|s,z_i)}). But the reward in this repo is defined as \sum_{i=1}^{L} {log q(s'|s,z) - log q(s'|s,z_i)} with numpy's broadcasting functionality. May I ask if I misunderstood or if there is any practical technique I'm missing?