SACD Discrete Soft Actor Critic
This PR introduces the Soft Actor Critic for discrete actions (SACD) algorithm.
Description
This PR implements the SAC-Discrete algorithm as described in this paper https://arxiv.org/abs/1910.07207. This implementation borrows code from the papers original implementation (https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) as well as provided by the issues author who requested this feature in stable baselines (https://github.com/toshikwa/sac-discrete.pytorch)
Context
- [x] I have raised an issue to propose this change (required)
- [x] Original issue in the stable baselines repo https://github.com/DLR-RM/stable-baselines3/issues/157
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [x] Documentation (update in the documentation)
Checklist:
- [x] I've read the CONTRIBUTION guide (required)
- [ ] The functionality/performance matches that of the source (required for new training algorithms or training-related features).
- [x] I have updated the tests accordingly (required for a bug fix or a new feature).
- [x] I have included an example of using the feature (required for new features).
- [ ] I have included baseline results (required for new training algorithms or training-related features).
- [x] I have updated the documentation accordingly.
- [ ] I have updated the changelog accordingly (required).
- [x] I have reformatted the code using
make format(required) - [x] I have checked the codestyle using
make check-codestyleandmake lint(required) - [x] I have ensured
make pytestandmake typeboth pass. (required)
Note: we are using a maximum length of 127 characters per line
Hello, thanks for the PR =)
The functionality/performance matches that of the source (required for new training algorithms or training-related features).
please don't forget that part (see contributing guide). I think there are discussion about the results here too: https://github.com/vwxyzjn/cleanrl/pull/270
Hello, thanks for the feedback :) Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?
yes please =)