stable-baselines3-contrib icon indicating copy to clipboard operation
stable-baselines3-contrib copied to clipboard

Implement Beyond the Rainbow (BTR) Algorithm

Open jbuerman opened this issue 4 weeks ago • 3 comments

I would like to contribute Beyond the Rainbow (BTR) to Stable-Baselines3 which improves over Rainbow Deep Q-Network (DQN) with six improvements from across the RL literature and is designed with computational efficiency in mind to train on high-end desktop PCs.

Paper: https://arxiv.org/abs/2411.03820

Code: https://github.com/VIPTankz/BTR

Background

Beyond the Rainbow (BTR) is an image-based RL algorithm with a discrete action space that improves over Rainbow DQN by adding 6 further improvements, namely, Impala (Scale=2), Adaptive Maxpooling (6x6), Spectral Normalization, Implicit Quantile Networks, Munchausen and Vectorized Environments. The algorithm has stated to gain traction (https://scholar.google.com/scholar?cites=3310089883274021659).

BTR is competitive with recent algorithms like Dreamer-v3 (Hafner et al., 2023) or MEME (Kapturowski et al.,2023) considering its focus on training in more resource restricted environments like desktop PCs. The algorithm has been benchmarked by training on a high-end desktop PC, achieving a human-normalized interquartile mean (IQM) of 7.4 on Atari-60 within 12 hours.

The implementation is based on PyToch.

Benefits

  • Provide a state-of-the-art algorithm that provides the capability to train on high-end desktop which is of interest to smaller research labs and hobbyist who won’t have access to the hardware to train with more resource intensive algorithms.
  • BTR can handle complex 3D games and has been used to train agents for Super Mario Galaxy, Mario Kart and Mortal Kombat (https://www.youtube.com/playlist?list=PL4geUsKi0NN-sjbuZP_fU28AmAPQunLoI) gaining interest from a community around building agents for games.

Practical Details

I will be working with the original author Tyler Clark to ensure that the SB3 implementation will achieve the performance stated in the paper.

jbuerman avatar Dec 02 '25 09:12 jbuerman

Hello, thanks for the proposal, but I would actually prefer to have rainbow first, see https://github.com/DLR-RM/stable-baselines3/issues/622 and related PR like https://github.com/DLR-RM/stable-baselines3/pull/1622

We need help there, especially for benchmarking and having efficient PER implementation that works with VecEnv.

araffin avatar Dec 02 '25 09:12 araffin

Hello,

Thanks for clarifying the priorities. I understand your preference to have Rainbow implemented first and we might be able to support the PER implementation and benchmarking if that helps accelerate Rainbow’s integration.

Once the PER is integrated and validated, we’d like to propose adding BTR as an extension that offers the stated computationally efficient improvements for desktop-scale training.

Does that sound like a reasonable path going forward?

jbuerman avatar Dec 05 '25 21:12 jbuerman

Does that sound like a reasonable path going forward?

this sounds reasonable =)

Make sure to read the contributing guide to know what is expected.

You might also have a look at other PR that implemented new algorithms:

  • https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/40
  • https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/243
  • https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/53

araffin avatar Dec 08 '25 13:12 araffin

Thanks for confirming!

I’ll make sure to follow the contributing guide and check out the referenced PRs.

I appreciate the guidance. I'll update as things progress.

jbuerman avatar Dec 12 '25 12:12 jbuerman