stable-baselines3-contrib icon indicating copy to clipboard operation
stable-baselines3-contrib copied to clipboard

Implemented CrossQ

Open danielpalen opened this issue 2 months ago • 6 comments

This PR implements CrossQ (https://openreview.net/pdf?id=PczQtTsTIX), a novel off-policy deep RL algorithm that carefully uses batch normalisation and removes target networks to achieve state-of-the-art sample efficiency at a much lower computational complexity, as it does not require large update-to-data-ratios.

Description

This implementation is a PyTorch implementation based on the original JAX implementation (https://github.com/adityab/CrossQ). The following plot shows that the performance matches the performance reported in the original paper, as well as the performance of the open source SBX implementation provided by the authors (evaluated on 10 seeds).

sbx_reproduce

Context

  • [x] I have raised an issue to propose this change (required) closes #238

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [x] Documentation (update in the documentation)

Checklist:

  • [x] I've read the CONTRIBUTION guide (required)
  • [x] The functionality/performance matches that of the source (required for new training algorithms or training-related features).
  • [x] I have updated the tests accordingly (required for a bug fix or a new feature).
  • [x] I have included an example of using the feature (required for new features).
  • [x] I have included baseline results (required for new training algorithms or training-related features).
  • [x] I have updated the documentation accordingly.
  • [ ] I have updated the changelog accordingly (required).
  • [x] I have reformatted the code using make format (required)
  • [x] I have checked the codestyle using make check-codestyle and make lint (required)
  • [x] I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

danielpalen avatar May 04 '24 11:05 danielpalen