maddpg
maddpg copied to clipboard
The result is not that ideal like the paper showed
I just run maddpg in simple_speaker_listener several times,but none of them get the -20 avg-reward like the paper proposed. Are there anything i should modify to get a better or more stable result?
Looks like you're not the only one having trouble reproducing some results: #12
I am getting -60 rewards, is that normal for just running the code without any alternations?
Also, in scenario=simple_speaker_listener, this code cannot converge to the result reported in Fig.4. Anyone knows the problem?