L2D icon indicating copy to clipboard operation
L2D copied to clipboard

Confused about PPO update

Open Githuber-zwb opened this issue 8 months ago • 0 comments

I'm a bit confused about the PPO update process. In the line 110: Screenshot from 2024-06-06 11-21-26 The rewards in a single episode ​​are normalized by subtracting the mean while divided by the variance. So why should the rewards be scaled? I found that though normalized, some truly bad rewards are scaled and important information is lost.

Githuber-zwb avatar Jun 06 '24 03:06 Githuber-zwb