random-network-distillation-pytorch Extrinsic reward clipping

Extrinsic reward clipping

Open cangozpi opened this issue 1 year ago • 0 comments

In the RND paper on page 15, it mentions that extrinsic rewards are clipped in [-1,1]. But in the official RND code in atari_wrappers.py it clips extrinsic rewards using the ClipRewardEnv function which does:

"""Bin reward to {+1, 0, -1} by its sign."""
        return float(np.sign(reward))

I believe the implementation and the explanation in the paper is a little different. In your implementation (jcwleo) you are clipping by doing:

        total_reward = total_reward.reshape([num_step, num_env_workers]).transpose().clip(-1, 1)

I believe this is different than the official implementation. Does anyone have an explanation of this discrepancy and what to use ?

Mar 18 '24 15:03 cangozpi

random-network-distillation-pytorch random-network-distillation-pytorch copied to clipboard

Extrinsic reward clipping

random-network-distillation-pytorch
random-network-distillation-pytorch copied to clipboard