batch-ppo icon indicating copy to clipboard operation
batch-ppo copied to clipboard

Assuring a non-zero increase of _penalty

Open dchichkov opened this issue 6 years ago • 2 comments

Does it makes sense?

dchichkov avatar Jan 11 '19 04:01 dchichkov

Yes, this seems reasonable. Did you train an agent like this to see if it affects performance?

danijar avatar Jan 12 '19 03:01 danijar

I've seen in my environment that _penalty does go to exact zero, and "increase penalty" logic doesn't increase it as a result. I haven't performed enough runs to tell, if it affects performance or not.

It may as well be that it doesn't and a sensible change would be to stop wasting time on calculating KL term, once _penalty is zero!

dchichkov avatar Jan 12 '19 06:01 dchichkov