modular_rl icon indicating copy to clipboard operation
modular_rl copied to clipboard

Wlast.set_value(Wlast.get_value(borrow=True)*0.1)

Open sjshao09 opened this issue 9 years ago • 4 comments

Hi John,

I have read your TRPO paper and I'm trying to reproduce the Fisher-Vector Product calculation function in C. Line 36-37 in agentzoo.py make me confused. I copy the weights to my code, feed ob_no into the network, and check its outputs against prob_np. It turned out that the mean values in prob_np are the original neural network outputs that are not multiplied by 0.1. (I use theano backend, swimmer-v1 test case, 8-64-64-2 network.) Also the *0.1 thing is not mentioned in the TRPO paper. I was wondering whether you can shed some light on this issue.

    Wlast = net.layers[-1].W
    Wlast.set_value(Wlast.get_value(borrow=True)*0.1)

Thank you in advance!

thanks Patrick

sjshao09 avatar Feb 25 '17 18:02 sjshao09

Oh yeah, that's a known bug in my code. I haven't looked at this code in a while. Are you sure it's not mentioned? I thought we said something about that.

joschu avatar Feb 25 '17 18:02 joschu

Hi John, Thanks for your quick reply! I read the latest version (v4) of TRPO paper downloaded form arxiv from the beginning to the end, which doesn't seem to mention that... Patrick

sjshao09 avatar Feb 25 '17 18:02 sjshao09

OK, I'll add that in the next draft.

joschu avatar Feb 25 '17 18:02 joschu

Hi,

I'm interested in the reasoning for this .1 multiplier as well. It seems like it never was added to the draft? If it was would you mind pointing me to it or explaining the reasoning for this here?

Breakend avatar Jul 11 '17 20:07 Breakend