random-network-distillation-pytorch
random-network-distillation-pytorch copied to clipboard
Intrinsic reward calculation, sum or mean?
Hi!
I have a question related to how the intrinsic rewards are calculated. Why do you use the sum(1) instead of mean(1)? https://github.com/jcwleo/random-network-distillation-pytorch/blob/e383fb95177c50bfdcd81b43e37c443c8cde1d94/agents.py#L76
That would calculate the sum along the 512 output neurons, which is different than calculating the mean along those outputs.
At the original release with tensorflow, they use reduce_mean, and im a little bit confused. https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/policies/cnn_gru_policy_dynamics.py#L241
Hope you could clear me, Thank you in advance
Have you get any idea now? I am also confused here, it is different from calculating the MSE. I am also wander why 2 is divided here, not n like MSE.
Thanks in advance
Could it be that this difference does not matter because we are using reward_rms to normalize the intrinsic rewards ?