ReinforcementLearning.jl
ReinforcementLearning.jl copied to clipboard
R2D2
Implementing the recurrent and distributed Rl algorithm R2D2(https://openreview.net/pdf?id=r1lyTjAqYX).
- I wanted to know in Prioritized_DQN, where are new states, rewards, terminals being extracted from the env.
- Where are we sampling the batch from replay buffer
- To implement multiple actors for distributed DQN, should I have a similar structure like the multithread envs for A2C?
https://github.com/JuliaReinforcementLearning/ReinforcementLearningCore.jl/blob/df518b60103277adfb0a49f30902b96dcc16b9c2/src/components/agents/agent.jl#L103-L116
-
https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/blob/e28836194fcd8256d2bb28d64f8bee0a788f2e69/src/algorithms/dqns/common.jl#L7
-
No. There will be a general implementation provided in https://github.com/JuliaReinforcementLearning/DistributedReinforcementLearning.jl . It will be very different from the MultiThreadingEnv in A2C. In short, it will be a combination of rllib and acme. The goal is that we can easily transform the experiment working on one machine to a large scale of machines. I'm still working on it so be patient please. I'd suggest you to focus on the recurrent part first. The distributed part will be supported out of the box later.