dads DADS reward implementation

DADS reward implementation

Open slee01 opened this issue 3 years ago • 0 comments

Thank you for sharing your great code :)

I think I found that the reward function is a little different from what was defined in the paper(iclr2020): https://github.com/google-research/dads/blob/abc37f532c26658e41ae309b646e8963bd7a8676/unsupervised_skill_learning/dads_agent.py#L142-L144

As far as I understand, the first reward term defined in eq. 6 of the paper is log q(s'|s,z) - log(\sum_{i=1}^{L}{q(s'|s,z_i)}). But the reward in this repo is defined as \sum_{i=1}^{L} {log q(s'|s,z) - log q(s'|s,z_i)} with numpy's broadcasting functionality. May I ask if I misunderstood or if there is any practical technique I'm missing?

Oct 31 '21 02:10 slee01

dads dads copied to clipboard

DADS reward implementation

dads
dads copied to clipboard