PyTorch-Counterfactual-Multi-Agent-Policy-Gradients-COMA
PyTorch-Counterfactual-Multi-Agent-Policy-Gradients-COMA copied to clipboard
Did you forget about the reward from the second agent?
Hi... I'm new with COMA and PyTorch. As I read your code, it's impressive and useful. However, I saw at line 165 you didn't include reward from the second agent. I would like to know why...
That is because in this environment, the two agents share the same reward. You can definitely record reward from other agents.
That is because in this environment, the two agents share the same reward. You can definitely record reward from other agents.
Hi! If i change the environment, eg. env_CatchPigs, which the two agents have different reward, What should be stored in r_list[t] used in line 94? The sum of rewards of agent1 and agent2? I'm new with COMA too...
That is because in this environment, the two agents share the same reward. You can definitely record reward from other agents.
It seems that in this environment, the rewards of the two agents may be different at the end of an episode!
That is because in this environment, the two agents share the same reward. You can definitely record reward from other agents.
It seems that in this environment, the rewards of the two agents may be different at the end of an episode!
Seems so. But they surely can have different rewards.
That is because in this environment, the two agents share the same reward. You can definitely record reward from other agents.
I tried to provide the two agents with the same reward (sum up their independent rewards) as in the paper of COMA, and your implementation still works!