PyTorch-Counterfactual-Multi-Agent-Policy-Gradients-COMA icon indicating copy to clipboard operation
PyTorch-Counterfactual-Multi-Agent-Policy-Gradients-COMA copied to clipboard

Leverage the use of recurrent modules

Open Fernadoo opened this issue 2 years ago • 0 comments

Hi, I am wondering if the oscillation of the training phase comes from the fact that you only include down-sampling layers in your actor nets, since in partially observable domains, the information state of agents should be the whole line of history instead of the current one-shot observation. Thus, to include a recurrent module like LSTM or GRU might be helpful.

Fernadoo avatar Apr 25 '22 07:04 Fernadoo