awr Why Normalization of vf

Why Normalization of vf

Open im-Kitsch opened this issue 2 years ago • 1 comments

Hello,

thanks for the code, while I tried to re-implement the program, I find that there is one step to normalize value function vf here . It's implementated by v_predict = v(s; \theta) * (1-/gamma) and critic update is implemented by min_\theta [v(s; \theta) * (1-/gamma) - v_estimate ]^2.

Is there any reason to normalize Value functions output, I tested to remove the normalization term and rescaled learning rate(by 1-gamma), looks there is no problem in HalfCheetah-v2.

It holds similar performance with original version.

Best,

Jun 15 '22 10:06 im-Kitsch

the value scaling is just mainly a convention, i generally like to keep things normalized between 0 and 1. Training should work just as well without the normalization, but it might just need some tuning for the other hyper parameters like the stepsize.

Jun 15 '22 23:06 xbpeng

awr awr copied to clipboard

Why Normalization of vf

awr
awr copied to clipboard