Mava icon indicating copy to clipboard operation
Mava copied to clipboard

[BUG] Optimizers for critic and policy both backpropagating on observation network

Open EdanToledo opened this issue 2 years ago • 1 comments

Describe the bug Both optimizers train the observation network separately which can caused the representation learnt to be unstable. Happens in MAD4PG.

Expected behavior only one optimizer should work on the observation network or there should be a single optimizer.

EdanToledo avatar Jan 28 '22 08:01 EdanToledo

@arnupretorius - Adding the option for a single optimizer is easy and I will do it now but from a design standpoint which optimizer should optimize the observation network in the double optimizer case because either way if the critic and policy both use it, it will potentially cause instability.

EdanToledo avatar Feb 01 '22 10:02 EdanToledo

Closing all TF issues as we are depreciating our TF systems.

DriesSmit avatar Sep 08 '22 14:09 DriesSmit