StimulateX
Results
1
comments of
StimulateX
It seems that the parameters of target_critic network should be updated as target_critic = tau*critic + (1-tau)*target_critic. And the actual critic networks is updated according to gradient of loss function....