stable-baselines3-contrib icon indicating copy to clipboard operation
stable-baselines3-contrib copied to clipboard

SACD Discrete Soft Actor Critic

Open splatter96 opened this issue 2 years ago • 6 comments

This PR introduces the Soft Actor Critic for discrete actions (SACD) algorithm.

Description

This PR implements the SAC-Discrete algorithm as described in this paper https://arxiv.org/abs/1910.07207. This implementation borrows code from the papers original implementation (https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) as well as provided by the issues author who requested this feature in stable baselines (https://github.com/toshikwa/sac-discrete.pytorch)

Context

  • [x] I have raised an issue to propose this change (required)
  • [x] Original issue in the stable baselines repo https://github.com/DLR-RM/stable-baselines3/issues/157

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [x] Documentation (update in the documentation)

Checklist:

  • [x] I've read the CONTRIBUTION guide (required)
  • [ ] The functionality/performance matches that of the source (required for new training algorithms or training-related features).
  • [x] I have updated the tests accordingly (required for a bug fix or a new feature).
  • [x] I have included an example of using the feature (required for new features).
  • [ ] I have included baseline results (required for new training algorithms or training-related features).
  • [x] I have updated the documentation accordingly.
  • [ ] I have updated the changelog accordingly (required).
  • [x] I have reformatted the code using make format (required)
  • [x] I have checked the codestyle using make check-codestyle and make lint (required)
  • [x] I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

splatter96 avatar Aug 07 '23 12:08 splatter96

Hello, thanks for the PR =)

The functionality/performance matches that of the source (required for new training algorithms or training-related features).

please don't forget that part (see contributing guide). I think there are discussion about the results here too: https://github.com/vwxyzjn/cleanrl/pull/270

araffin avatar Aug 12 '23 07:08 araffin

Hello, thanks for the feedback :) Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

splatter96 avatar Sep 01 '23 13:09 splatter96

yes please =)

araffin avatar Sep 01 '23 13:09 araffin