Mava
Mava copied to clipboard
[BUG] Optimizers for critic and policy both backpropagating on observation network
Describe the bug Both optimizers train the observation network separately which can caused the representation learnt to be unstable. Happens in MAD4PG.
Expected behavior only one optimizer should work on the observation network or there should be a single optimizer.
@arnupretorius - Adding the option for a single optimizer is easy and I will do it now but from a design standpoint which optimizer should optimize the observation network in the double optimizer case because either way if the critic and policy both use it, it will potentially cause instability.
Closing all TF issues as we are depreciating our TF systems.