PyTorch-Counterfactual-Multi-Agent-Policy-Gradients-COMA
PyTorch-Counterfactual-Multi-Agent-Policy-Gradients-COMA copied to clipboard
Memory leak
It seems that running COMA2.py
results in memory leak, and the program takes all the memory after some episodes.
First check if it comes from COMA or envFindGoal. Simply run random actions on envFindGoal, if it works, the problem is from COMA.
First check if it comes from COMA or envFindGoal. Simply run random actions on envFindGoal, if it works, the problem is from COMA.
It seems that memory leak is caused by function cross_prod
and can be avoided by adding with torch.no_grad()
before calling cross_prod
.
And I need your help about another question!
In the definition in FindGoals.pdf
, an episode ends once any one agent reach its goal; while in env_FindGoals.py
an episode ends only when the first agent arrives;
And I supposed that the goal of 'FindGoals' task should be finding goals for both agents. The learned policies of your code only drive one agent to reach its goal while the other one only stays at its start position.
you can simply modify the last several lines of in step( ) function of env_FindGoals to determine when the episode should stop.