minimal-marl
minimal-marl copied to clipboard
Doubt in train step
Hi, in this line
https://github.com/koulanurag/minimal-marl/blob/bd4ac9360aa686f105d1c3a0f39192ecd824e69d/vdn.py#L101
shouldn't only the active agents contribute to the summation? i.e. (max_q_prime * (1 - done[:, step_i])).sum(dim=1, keepdims=True)