rl
rl copied to clipboard
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
## Motivation Often times we only want to train an algorithm until it learned the intended behavior, and a total number of frames is only a proxy for the stopping...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2358
## Without the primer, the collector does not feed any hidden state to the policy in the [RNN tutorial ](https://github.com/pytorch/rl/blob/main/tutorials/sphinx-tutorials/dqn_with_rnn.py)it is stated that the primer is optional and it is...
- [ ] `break_when_all_done` in `env.rollout()` #2355 - [ ] Partial steps in env #2356 - [ ] `BatchedEnv`: pass the indices of envs where a step should be done...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #2359 * #2358 * __->__ #2354 * #2307 * #2306 * #2305 * #2304
## Motivation #2355 would be much cleaner if we could do partial steps in batched or stateless envs. ### Design question - Should we index the batched env to make...
## Description Add LLM Collector ## Motivation and Context #2872 ## Types of changes What types of changes does your code introduce? Remove all that do not apply: - [...
## Describe the bug I see very cool advancements in the direction of LLM RL training in the repo, awesome work! :) After playing a bit with the LLMEnv I...
## Motivation We need a [collector](https://pytorch.org/rl/stable/reference/generated/torchrl.collectors.SyncDataCollector.html?highlight=syncdatacollector) that fits well the LLM space. We will need to simplify the rollout function greatly - I would rewrite it from scratch. The LLMEnv...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2865