stable-baselines3 Prioritized Experience Replay for DQN

🚀 Feature

Prioritized Experience Replay for DQN

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

[X] I have checked that there is no similar issue in the repo

Dec 26 '22 10:12 vnvdev

It's planned, contributions are welcome 🙂

Dec 26 '22 11:12 qgallouedec

See https://github.com/DLR-RM/stable-baselines3/issues/622

Dec 26 '22 12:12 araffin

@araffin @qgallouedec Hello, are there any news on prioritized experience replay, or you're still waiting for contributions?

Mar 17 '23 15:03 AlexPasqua

or you're still waiting for contributions?

We are welcoming contributions =) I guess adapting https://github.com/Howuhh/prioritized_experience_replay from @Howuhh would be a good contribution.

Mar 18 '23 08:03 araffin

Hi @araffin - just looking at this. How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).

Mar 21 '23 15:03 emrul

How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).

not sure, I need to take a deeper look, but probably once for all if possible or whatever is cleaner/fast enough. We might need to do something similar to: https://github.com/DLR-RM/stable-baselines3/pull/704

Mar 29 '23 13:03 araffin

Hi @araffin - just looking at this. How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).

I think it might matter depending on replacement strategy. Do you override the latest observation or the one with lowest priority? What happens if VecEnv holds different environments? E.g. LunarLander with different gravity / wind parameters. If one environment is significantly more difficult compared to others, then wouldn't joint buffer be skewed toward it? "Hard overall" observations vs "hard for each on average" observations. It's more of a theoretical question though.

Apr 25 '23 10:04 mkhlyzov

We are welcoming contributions =) I guess adapting https://github.com/Howuhh/prioritized_experience_replay from @Howuhh would be a good contribution.

Hello @araffin, since I've recently used and contributed to @Howuhh 's PER implementation, and since I'm also familiar with SB3 (having contributed before), I could work on its adaptation for this library! (and maybe @Howuhh wants to join as well?)

Jul 20 '23 13:07 AlexPasqua

@AlexPasqua even though I think it's very important, I'm unfortunately busy integrating Minari to CORL at the moment, so I'm unlikely to find the time to do it. But I'm glad if my implementation will be useful!

Jul 20 '23 13:07 Howuhh

@AlexPasqua even though I think it's very important, I'm unfortunately busy integrating Minari to CORL at the moment, so I'm unlikely to find the time to do it. But I'm glad if my implementation will be useful!

Alright, no problem, I'll do it myself :)

Jul 20 '23 19:07 AlexPasqua