maddpg
maddpg copied to clipboard
Q divergence
Hello! I am working to implement MADDPG in pytorch based on the details of this implementation in tensorflow. I have followed the implementation to a tee, but I when I remove regularization on the policy logits, my Q values diverge. When I remove the same regularization term in your implementation, this does not occur. Did you experience this divergence issue? Was a matter of tuning to fix or does this indicate an issue with my implementation? Thank you.