maddpg
maddpg copied to clipboard
Episode in cooperative navigation env
Hi, Thank you for releasing the code. I have some questions about the 'done' situation in the cooperative navigation environment. I don't see any done function for the env. I just see the maximum time step for one episode for the terminal condition. 1- Is it the only situation that the env will be done and we need to reset the world? 2- How about when agents cover the landmarks? do they try to continue to cover the landmarks until the max time step is reached? 3- what is the max steps for the results you reported in table 2 in the paper for cooperative navigation env? Do you calculate the number of touches and the mean distance to landmarks in these number of time steps?
Thank you in advance