batch-ppo
batch-ppo copied to clipboard
Assuring a non-zero increase of _penalty
Does it makes sense?
Yes, this seems reasonable. Did you train an agent like this to see if it affects performance?
I've seen in my environment that _penalty does go to exact zero, and "increase penalty" logic doesn't increase it as a result. I haven't performed enough runs to tell, if it affects performance or not.
It may as well be that it doesn't and a sensible change would be to stop wasting time on calculating KL term, once _penalty is zero!