rl
rl copied to clipboard
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
## Description Parent PR of the QMIX example algorithm
## Description Implements training script instantiations from structured hydra configs - [x] Collectors #471 - [x] Envs and transforms #472 - [x] replay buffers #493 - [x] models #496 -...
## Description Support for MBPO ## Motivation and Context MBPO is a sample efficient method that improves over SAC. ## Types of changes What types of changes does your code...
## Description Better set comparison, using `==` whenever possible. ## Motivation and Context Originally, we tested that difference between two sets was 0 and intersection was complete. `==` does the...
## Describe the bug When training on `PettingZoo/MultiWalker-v9` with `Multi-Agent Soft Actor-Critic`, **all** losses (`loss_actor`, `loss_qvalue`, `loss_alpha`) explode after ~1M environment steps at most. This phenomenon occurs regardless of (reasonable)...
## Motivation I want to train using multiple GPUs on a single machine, but I can't find relevant tutorial documentation. Could you provide an example of training using multiple GPUs...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * (to be filled)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * (to be filled)