Mava [BUG] Optimizers for critic and policy both backpropagating on observation network

[BUG] Optimizers for critic and policy both backpropagating on observation network

Open EdanToledo opened this issue 2 years ago • 1 comments

Describe the bug Both optimizers train the observation network separately which can caused the representation learnt to be unstable. Happens in MAD4PG.

Expected behavior only one optimizer should work on the observation network or there should be a single optimizer.

Jan 28 '22 08:01 EdanToledo

@arnupretorius - Adding the option for a single optimizer is easy and I will do it now but from a design standpoint which optimizer should optimize the observation network in the double optimizer case because either way if the critic and policy both use it, it will potentially cause instability.

Feb 01 '22 10:02 EdanToledo

Closing all TF issues as we are depreciating our TF systems.

Sep 08 '22 14:09 DriesSmit

Mava Mava copied to clipboard

[BUG] Optimizers for critic and policy both backpropagating on observation network

Mava
Mava copied to clipboard