pytorch-trpo It seems that the importance sampling code part is wrong.

It seems that the importance sampling code part is wrong.

Open yhy258 opened this issue 1 year ago • 2 comments

https://github.com/ikostrikov/pytorch-trpo/blob/e200eb8a23b3c7941a0091efb9750dafa4b23cbb/main.py#L108-L119

The fixed log prob part of the line and the "get_loss" function part are exactly the same. The two parts are executed consecutively so that the two values ("fixed_log_prob", "log_prob") are exactly the same. Is there a reason you wrote the code like this?

May 07 '23 12:05 yhy258

get_kl，also has this problem

Jan 14 '24 06:01 asyua-ye

pytorch-trpo pytorch-trpo copied to clipboard

It seems that the importance sampling code part is wrong.

pytorch-trpo
pytorch-trpo copied to clipboard