Prioritized Experience Replay for DQN
🚀 Feature
Prioritized Experience Replay for DQN
Motivation
No response
Pitch
No response
Alternatives
No response
Additional context
No response
Checklist
- [X] I have checked that there is no similar issue in the repo
It's planned, contributions are welcome 🙂
See https://github.com/DLR-RM/stable-baselines3/issues/622
@araffin @qgallouedec Hello, are there any news on prioritized experience replay, or you're still waiting for contributions?
or you're still waiting for contributions?
We are welcoming contributions =) I guess adapting https://github.com/Howuhh/prioritized_experience_replay from @Howuhh would be a good contribution.
Hi @araffin - just looking at this. How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).
How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).
not sure, I need to take a deeper look, but probably once for all if possible or whatever is cleaner/fast enough. We might need to do something similar to: https://github.com/DLR-RM/stable-baselines3/pull/704
Hi @araffin - just looking at this. How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).
I think it might matter depending on replacement strategy. Do you override the latest observation or the one with lowest priority? What happens if VecEnv holds different environments? E.g. LunarLander with different gravity / wind parameters. If one environment is significantly more difficult compared to others, then wouldn't joint buffer be skewed toward it? "Hard overall" observations vs "hard for each on average" observations. It's more of a theoretical question though.
We are welcoming contributions =) I guess adapting https://github.com/Howuhh/prioritized_experience_replay from @Howuhh would be a good contribution.
Hello @araffin, since I've recently used and contributed to @Howuhh 's PER implementation, and since I'm also familiar with SB3 (having contributed before), I could work on its adaptation for this library! (and maybe @Howuhh wants to join as well?)