ASE The implementation seems to be different from the method in the paper

The implementation seems to be different from the method in the paper

Open zmccmzty opened this issue 1 year ago • 1 comments

It looks like the policy and the discriminator are trained together at the same rate with single optimizer and combined loss (https://github.com/nv-tlabs/ASE/blob/21257078f0c6bf75ee4f02626260d7cf2c48fee0/ase/learning/ase_agent.py#L280C1-L280C1). It seems to be different from the pseudocode in the paper, where they were trained separately. Any idea about what's the reason for this? Or am I missing something?

Jan 17 '24 22:01 zmccmzty

I have the same question here ...

Jan 22 '24 12:01 Winston-Gu

ASE ASE copied to clipboard

The implementation seems to be different from the method in the paper

ASE
ASE copied to clipboard