MountainCar_DQN_RND
MountainCar_DQN_RND copied to clipboard
I have an idea. But I don't know whether it will work.
I think that the value RND return as the intrinsic reward is not good in DQN. And I think it can be used to select action. So the dimension of RND‘s output is action dimension. And then in DQN,you can change the action selection algorithm by using this idea to achieve exploration.