Multi-Agent-Deep-Deterministic-Policy-Gradients
Multi-Agent-Deep-Deterministic-Policy-Gradients copied to clipboard
usage of critic_value_new[dones[:, 0.0]] = 0.0 in learn()
trafficstars
https://github.com/philtabor/Multi-Agent-Deep-Deterministic-Policy-Gradients/blob/a3c294aa6834f348a7401306dff3e67919c861f5/maddpg.py#L74
Hi Phill,
Could you please help me to understand what's this line is for? critic_value_new[dones[:, 0.0]] = 0.0 Since critic_value_new float variable it cannot be used as array. Should we set just dones[agent_idx] to 0?
Thanks and Regards Viji
Since Q terminal state is 0.