L2D Confused about PPO update

Confused about PPO update

Open Githuber-zwb opened this issue 8 months ago • 0 comments

I'm a bit confused about the PPO update process. In the line 110: Screenshot from 2024-06-06 11-21-26 The rewards in a single episode are normalized by subtracting the mean while divided by the variance. So why should the rewards be scaled? I found that though normalized, some truly bad rewards are scaled and important information is lost.

Jun 06 '24 03:06 Githuber-zwb

L2D L2D copied to clipboard

Confused about PPO update

L2D
L2D copied to clipboard