stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

Prioritized Experience Replay for DQN

Open vnvdev opened this issue 2 years ago • 10 comments

🚀 Feature

Prioritized Experience Replay for DQN

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

  • [X] I have checked that there is no similar issue in the repo

vnvdev avatar Dec 26 '22 10:12 vnvdev

It's planned, contributions are welcome 🙂

qgallouedec avatar Dec 26 '22 11:12 qgallouedec

See https://github.com/DLR-RM/stable-baselines3/issues/622

araffin avatar Dec 26 '22 12:12 araffin

@araffin @qgallouedec Hello, are there any news on prioritized experience replay, or you're still waiting for contributions?

AlexPasqua avatar Mar 17 '23 15:03 AlexPasqua

or you're still waiting for contributions?

We are welcoming contributions =) I guess adapting https://github.com/Howuhh/prioritized_experience_replay from @Howuhh would be a good contribution.

araffin avatar Mar 18 '23 08:03 araffin

Hi @araffin - just looking at this. How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).

emrul avatar Mar 21 '23 15:03 emrul

How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).

not sure, I need to take a deeper look, but probably once for all if possible or whatever is cleaner/fast enough. We might need to do something similar to: https://github.com/DLR-RM/stable-baselines3/pull/704

araffin avatar Mar 29 '23 13:03 araffin

Hi @araffin - just looking at this. How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).

I think it might matter depending on replacement strategy. Do you override the latest observation or the one with lowest priority? What happens if VecEnv holds different environments? E.g. LunarLander with different gravity / wind parameters. If one environment is significantly more difficult compared to others, then wouldn't joint buffer be skewed toward it? "Hard overall" observations vs "hard for each on average" observations. It's more of a theoretical question though.

mkhlyzov avatar Apr 25 '23 10:04 mkhlyzov

We are welcoming contributions =) I guess adapting https://github.com/Howuhh/prioritized_experience_replay from @Howuhh would be a good contribution.

Hello @araffin, since I've recently used and contributed to @Howuhh 's PER implementation, and since I'm also familiar with SB3 (having contributed before), I could work on its adaptation for this library! (and maybe @Howuhh wants to join as well?)

AlexPasqua avatar Jul 20 '23 13:07 AlexPasqua

@AlexPasqua even though I think it's very important, I'm unfortunately busy integrating Minari to CORL at the moment, so I'm unlikely to find the time to do it. But I'm glad if my implementation will be useful!

Howuhh avatar Jul 20 '23 13:07 Howuhh

@AlexPasqua even though I think it's very important, I'm unfortunately busy integrating Minari to CORL at the moment, so I'm unlikely to find the time to do it. But I'm glad if my implementation will be useful!

Alright, no problem, I'll do it myself :)

AlexPasqua avatar Jul 20 '23 19:07 AlexPasqua