Gradient clipping and reward normalization parameters

Open danijar opened this issue 9 years ago • 0 comments

Hi there, cool project! I'm trying to reproduce the A3C results with my own implementation and have two questions regarding the Dr. Mnih confirmed parameters on the Wiki page: (1) There was no loss clipping. The A3C paper does mention gradient clipping however which is very similar I believe. (2) In the original DQN paper they normalized rewards by sign(R(s)) rather than max(0, min(R(s), 1) as listed in the Wiki. Could you provide some clarification on these two points, please?

Jul 22 '16 13:07 danijar