Multi-Agent-Deep-Deterministic-Policy-Gradients usage of critic_value

usage of critic_value_new[dones[:, 0.0]] = 0.0 in learn()

Open VijiKK opened this issue 3 years ago • 1 comments

trafficstars

https://github.com/philtabor/Multi-Agent-Deep-Deterministic-Policy-Gradients/blob/a3c294aa6834f348a7401306dff3e67919c861f5/maddpg.py#L74

Hi Phill,

Could you please help me to understand what's this line is for? critic_value_new[dones[:, 0.0]] = 0.0 Since critic_value_new float variable it cannot be used as array. Should we set just dones[agent_idx] to 0?

Thanks and Regards Viji

Jun 22 '22 03:06 VijiKK

Since Q terminal state is 0.

Aug 16 '22 08:08 Vishwanath1999

Multi-Agent-Deep-Deterministic-Policy-Gradients Multi-Agent-Deep-Deterministic-Policy-Gradients copied to clipboard

usage of critic_value_new[dones[:, 0.0]] = 0.0 in learn()

Multi-Agent-Deep-Deterministic-Policy-Gradients
Multi-Agent-Deep-Deterministic-Policy-Gradients copied to clipboard