random-network-distillation-pytorch icon indicating copy to clipboard operation
random-network-distillation-pytorch copied to clipboard

Intrinsic reward calculation, sum or mean?

Open aklein1995 opened this issue 4 years ago • 2 comments

Hi!

I have a question related to how the intrinsic rewards are calculated. Why do you use the sum(1) instead of mean(1)? https://github.com/jcwleo/random-network-distillation-pytorch/blob/e383fb95177c50bfdcd81b43e37c443c8cde1d94/agents.py#L76

That would calculate the sum along the 512 output neurons, which is different than calculating the mean along those outputs.

At the original release with tensorflow, they use reduce_mean, and im a little bit confused. https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/policies/cnn_gru_policy_dynamics.py#L241

Hope you could clear me, Thank you in advance

aklein1995 avatar Jul 30 '21 07:07 aklein1995

Have you get any idea now? I am also confused here, it is different from calculating the MSE. I am also wander why 2 is divided here, not n like MSE.

Thanks in advance

FlaminG0 avatar Jun 20 '23 09:06 FlaminG0

Could it be that this difference does not matter because we are using reward_rms to normalize the intrinsic rewards ?

cangozpi avatar Mar 05 '24 19:03 cangozpi