modular_rl Wlast.set_value(Wlast.get

Hi John,

I have read your TRPO paper and I'm trying to reproduce the Fisher-Vector Product calculation function in C. Line 36-37 in agentzoo.py make me confused. I copy the weights to my code, feed ob_no into the network, and check its outputs against prob_np. It turned out that the mean values in prob_np are the original neural network outputs that are not multiplied by 0.1. (I use theano backend, swimmer-v1 test case, 8-64-64-2 network.) Also the *0.1 thing is not mentioned in the TRPO paper. I was wondering whether you can shed some light on this issue.

    Wlast = net.layers[-1].W
    Wlast.set_value(Wlast.get_value(borrow=True)*0.1)

Thank you in advance!

thanks Patrick

Feb 25 '17 18:02 sjshao09

Oh yeah, that's a known bug in my code. I haven't looked at this code in a while. Are you sure it's not mentioned? I thought we said something about that.

Feb 25 '17 18:02 joschu

Hi John, Thanks for your quick reply! I read the latest version (v4) of TRPO paper downloaded form arxiv from the beginning to the end, which doesn't seem to mention that... Patrick

Feb 25 '17 18:02 sjshao09

OK, I'll add that in the next draft.

Feb 25 '17 18:02 joschu

Hi,

I'm interested in the reasoning for this .1 multiplier as well. It seems like it never was added to the draft? If it was would you mind pointing me to it or explaining the reasoning for this here?

Jul 11 '17 20:07 Breakend