Implement Beyond the Rainbow (BTR) Algorithm
I would like to contribute Beyond the Rainbow (BTR) to Stable-Baselines3 which improves over Rainbow Deep Q-Network (DQN) with six improvements from across the RL literature and is designed with computational efficiency in mind to train on high-end desktop PCs.
Paper: https://arxiv.org/abs/2411.03820
Code: https://github.com/VIPTankz/BTR
Background
Beyond the Rainbow (BTR) is an image-based RL algorithm with a discrete action space that improves over Rainbow DQN by adding 6 further improvements, namely, Impala (Scale=2), Adaptive Maxpooling (6x6), Spectral Normalization, Implicit Quantile Networks, Munchausen and Vectorized Environments. The algorithm has stated to gain traction (https://scholar.google.com/scholar?cites=3310089883274021659).
BTR is competitive with recent algorithms like Dreamer-v3 (Hafner et al., 2023) or MEME (Kapturowski et al.,2023) considering its focus on training in more resource restricted environments like desktop PCs. The algorithm has been benchmarked by training on a high-end desktop PC, achieving a human-normalized interquartile mean (IQM) of 7.4 on Atari-60 within 12 hours.
The implementation is based on PyToch.
Benefits
- Provide a state-of-the-art algorithm that provides the capability to train on high-end desktop which is of interest to smaller research labs and hobbyist who won’t have access to the hardware to train with more resource intensive algorithms.
- BTR can handle complex 3D games and has been used to train agents for Super Mario Galaxy, Mario Kart and Mortal Kombat (https://www.youtube.com/playlist?list=PL4geUsKi0NN-sjbuZP_fU28AmAPQunLoI) gaining interest from a community around building agents for games.
Practical Details
I will be working with the original author Tyler Clark to ensure that the SB3 implementation will achieve the performance stated in the paper.
Hello, thanks for the proposal, but I would actually prefer to have rainbow first, see https://github.com/DLR-RM/stable-baselines3/issues/622 and related PR like https://github.com/DLR-RM/stable-baselines3/pull/1622
We need help there, especially for benchmarking and having efficient PER implementation that works with VecEnv.
Hello,
Thanks for clarifying the priorities. I understand your preference to have Rainbow implemented first and we might be able to support the PER implementation and benchmarking if that helps accelerate Rainbow’s integration.
Once the PER is integrated and validated, we’d like to propose adding BTR as an extension that offers the stated computationally efficient improvements for desktop-scale training.
Does that sound like a reasonable path going forward?
Does that sound like a reasonable path going forward?
this sounds reasonable =)
Make sure to read the contributing guide to know what is expected.
You might also have a look at other PR that implemented new algorithms:
- https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/40
- https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/243
- https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/53
Thanks for confirming!
I’ll make sure to follow the contributing guide and check out the referenced PRs.
I appreciate the guidance. I'll update as things progress.