ReinforcementLearning.jl icon indicating copy to clipboard operation
ReinforcementLearning.jl copied to clipboard

R2D2

Open RajGhugare19 opened this issue 5 years ago • 2 comments

Implementing the recurrent and distributed Rl algorithm R2D2(https://openreview.net/pdf?id=r1lyTjAqYX).

RajGhugare19 avatar Oct 28 '20 20:10 RajGhugare19

  1. I wanted to know in Prioritized_DQN, where are new states, rewards, terminals being extracted from the env.
  2. Where are we sampling the batch from replay buffer
  3. To implement multiple actors for distributed DQN, should I have a similar structure like the multithread envs for A2C?

RajGhugare19 avatar Nov 05 '20 06:11 RajGhugare19

https://github.com/JuliaReinforcementLearning/ReinforcementLearningCore.jl/blob/df518b60103277adfb0a49f30902b96dcc16b9c2/src/components/agents/agent.jl#L103-L116

  1. https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/blob/e28836194fcd8256d2bb28d64f8bee0a788f2e69/src/algorithms/dqns/common.jl#L7

  2. No. There will be a general implementation provided in https://github.com/JuliaReinforcementLearning/DistributedReinforcementLearning.jl . It will be very different from the MultiThreadingEnv in A2C. In short, it will be a combination of rllib and acme. The goal is that we can easily transform the experiment working on one machine to a large scale of machines. I'm still working on it so be patient please. I'd suggest you to focus on the recurrent part first. The distributed part will be supported out of the box later.

findmyway avatar Nov 05 '20 12:11 findmyway