maddpg
maddpg copied to clipboard
How or why the gaussian distribution contributes to the training?
It's interesting that the code decomposes the output of actor network as the mean and the standard deviation, and then constructs a new action with a gaussian distribution. In past, there is always a extra noisy factor which will decrease gradually to control the adding noise. I wonder if you can explain how or why this will work :)
I will be also interested in an answer. As a matter of fact I have difficulties understanding how this work : from my understanding when doing the actor update we compute the gradient of the Q value with respect to an action sampled from the Gaussian distribution (which is thus stochastic) but we apply deterministic policy gradient. How can that be ? Thanks
I will be also interested in an answer. As a matter of fact I have difficulties understanding how this work : from my understanding when doing the actor update we compute the gradient of the Q value with respect to an action sampled from the Gaussian distribution (which is thus stochastic) but we apply deterministic policy gradient. How can that be ? Thanks
I can not agree more, the code sample actions from a distribution, which deviates its deterministic property.
I will be also interested in an answer. As a matter of fact I have difficulties understanding how this work : from my understanding when doing the actor update we compute the gradient of the Q value with respect to an action sampled from the Gaussian distribution (which is thus stochastic) but we apply deterministic policy gradient. How can that be ? Thanks
I can not agree more, the code sample actions from a distribution, which deviates its deterministic property.
I got the same problem, any progress since ? Many thanks
I think that gaussian distribution is used for exploring the action space. The action is not sampled by the gaussian distribution, but the value of action is determined by the gaussian distribution. The stddev is used to determine the scope of exploration,which is the output of DNN. Here are my personal views, welcome any correction.