maddpg How or why the gaussian distribution contributes to the training?

How or why the gaussian distribution contributes to the training?

Open Chen-Joe-ZY opened this issue 6 years ago • 4 comments

It's interesting that the code decomposes the output of actor network as the mean and the standard deviation, and then constructs a new action with a gaussian distribution. In past, there is always a extra noisy factor which will decrease gradually to control the adding noise. I wonder if you can explain how or why this will work :)

May 24 '18 03:05 Chen-Joe-ZY

I will be also interested in an answer. As a matter of fact I have difficulties understanding how this work : from my understanding when doing the actor update we compute the gradient of the Q value with respect to an action sampled from the Gaussian distribution (which is thus stochastic) but we apply deterministic policy gradient. How can that be ? Thanks

Jun 21 '18 19:06 PBarde

I will be also interested in an answer. As a matter of fact I have difficulties understanding how this work : from my understanding when doing the actor update we compute the gradient of the Q value with respect to an action sampled from the Gaussian distribution (which is thus stochastic) but we apply deterministic policy gradient. How can that be ? Thanks

I can not agree more, the code sample actions from a distribution, which deviates its deterministic property.

Feb 02 '19 14:02 GoingMyWay

I will be also interested in an answer. As a matter of fact I have difficulties understanding how this work : from my understanding when doing the actor update we compute the gradient of the Q value with respect to an action sampled from the Gaussian distribution (which is thus stochastic) but we apply deterministic policy gradient. How can that be ? Thanks

I can not agree more, the code sample actions from a distribution, which deviates its deterministic property.

I got the same problem, any progress since ? Many thanks

Apr 24 '19 03:04 suiguoxin

I think that gaussian distribution is used for exploring the action space. The action is not sampled by the gaussian distribution, but the value of action is determined by the gaussian distribution. The stddev is used to determine the scope of exploration,which is the output of DNN. Here are my personal views, welcome any correction.

Nov 02 '19 13:11 EastVolcano

maddpg maddpg copied to clipboard

How or why the gaussian distribution contributes to the training?

maddpg
maddpg copied to clipboard