Mengchao
Results
1
issues of
Mengchao
It looks like the policy and the discriminator are trained together at the same rate with single optimizer and combined loss (https://github.com/nv-tlabs/ASE/blob/21257078f0c6bf75ee4f02626260d7cf2c48fee0/ase/learning/ase_agent.py#L280C1-L280C1). It seems to be different from the pseudocode...