rl
rl copied to clipboard
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
## Motivation It is not very clear what should be the structure of TensorDict of the return of the _step() function for a multi agent environment. If there are two...
We still have a bunch of try/except in losses such as PPO to compute the entropy. We need to remove them for compile compatibility.
## Describe the bug dqn_cartpole from sota-implementations/dqn doesn't working. Crashes with: **ImportError: cannot import name 'Composite' from 'torchrl.data'** ## To Reproduce Just run dqn_cartpole.py, it will call ulits_cartpole.py and then...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #2393 * __->__ #2392
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2393 * #2392
## Motivation Attempting to implement [Parallel Q Networks](https://www.researchgate.net/publication/382080747_Simplifying_Deep_Temporal_Difference_Learning) (online DQN without replay buffer or target networks). Uses QLambda returns. ## Solution TDLambdaEstimator expects `state_value` keys but we would now need...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2382
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2389
> Hi there. Besides the naming, what do you think of adding some metadata for users to populate? > > This could be useful for, for example, marking which keys...