RLeXplore icon indicating copy to clipboard operation
RLeXplore copied to clipboard

NGU implementation

Open TakieddineSOUALHI opened this issue 1 year ago • 2 comments

Hi, First of all thank you for providing these implementations to the community.

I've a few questions about your NGU implementation. the original work uses two networks a randomly fixed network like in RND and an embedding network to calculate the exploration rewards. The idea of the embedding network is to use it to represent states in episodic memory and use them later to calculate intrinsic rewards. Also, the embedding network is trained each iteration to optimize for action-state pairs (a,s) with batches sampled from the replay buffer.

My questions are:

  • How does this implementation handle the episodic memory and the training embedding network. If I understood your implementation well. you assume that the buffer (either replay or rollout) is the episodic memory and use it to embed states.
  • Meanwhile the embedding network is used to calculate intrinsic rewards, a predictor network is the one trained and used for RND rewards. I didn't understand this part quite well. Can you elaborate this point please?

TakieddineSOUALHI avatar Feb 18 '23 10:02 TakieddineSOUALHI