StimulateX

Results 1 comments of StimulateX

It seems that the parameters of target_critic network should be updated as target_critic = tau*critic + (1-tau)*target_critic. And the actual critic networks is updated according to gradient of loss function....