cleanrl
cleanrl copied to clipboard
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
## Description Adopting poetry 1.2 support. The locking and adding dependencies are orders of magnitude faster! ## Types of changes - [ ] Bug fix - [ ] New feature...
## Problem Description Upgrade gym version used in cleanrl from 0.23.1 to 0.25.1 ## Checklist - [ ] I have installed dependencies via `poetry install` (see [CleanRL's installation guideline](https://docs.cleanrl.dev/get-started/installation/). -...
## Description Adds implementation of **Diversity is All You Need** paper. It is an unsupervised option learning framework which can later be used for transfer learning. ### To-Do - [x]...
Hey there! I've used this repo's SAC code as starting point for an implementation of SAC-discrete ([paper](https://arxiv.org/pdf/1910.07207.pdf)) for a project of mine. If you're interested, I'd be willing to contribute...
## Problem Description Given the incredible performance of the DDPG + JAX prototype (https://github.com/vwxyzjn/cleanrl/pull/187), it's worth prototyping JAX with other algorithms as well! This issue tracks the overall progress of...
## Description This PR closes #265. Had some preliminary results w/ multi-objective stuff, as shown in the following figure. The x-axis is the normalized score of CartPole-v1 and Acrobat-v1, and...
# Overview #228 prototyped a great initial integration with optuna to do hyperparameter optimization. However, it has a couple of downsides: 1. lack of support for tuning multiple environments when...
## Description Closes #258. Implement [Truncated Quantile Critics](https://paperswithcode.com/paper/controlling-overestimation-bias-with) ## Types of changes - [ ] Bug fix - [ ] New feature - [x] New algorithm - [ ] Documentation...
## Problem Description Would you be interested in adding [Truncated Quantile Critics](https://paperswithcode.com/paper/controlling-overestimation-bias-with) to CleanRL? If so I can work on a PR. If you are interested, then I have a...
## Problem Description In many CleanRL scripts, a timestamp is used as a differentiator in the naming of the jobs: https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo.py#L134 In some very rare cases (when running on a...