pytorch-REINFORCE Please add some explanation

Please add some explanation

Open parajain opened this issue 8 years ago • 3 comments

Hi,

Thank you for the sample code. I could not understand what exactly is happening here: https://github.com/JamesChuanggg/pytorch-REINFORCE/blob/master/reinforce_discrete.py#L52

If possible can you please give a little explanation.

Thanks

Sep 19 '17 16:09 parajain

it's just maximizing the function.

Dec 04 '17 07:12 hortune

This is where the loss is being calculated. If you look at the algorithm presented in Suttons book (page 289) it is slightly different from what is given here which is closer to Deep RL - Policy Gradients (page 34).

Basically what is happening is that instead of applying an update step after calculating each advantage * grad log pi, we calculate all the terms and them sum them into the loss so that we can call the backward() on that. I am not sure what the theoretical differences are between applying t updates per episode vs 1 update per episode but I am currently looking into it.

Jan 22 '18 04:01 zafarali

Also confused about the entropies in the loss function, can anyone make a little explanation ?

Oct 24 '18 07:10 gmftbyGMFTBY

pytorch-REINFORCE pytorch-REINFORCE copied to clipboard

Please add some explanation

pytorch-REINFORCE
pytorch-REINFORCE copied to clipboard