rl
rl copied to clipboard
[Feature Request] partial steps in batches envs
Motivation
In many scenarios we need to perform a step only on a subset of batched envs. This includes collecting a complete trajectory for many envs when they end asynchronously, or partial frame skip and such.
Solution
Serial and parallel envs could read an index key that would indicate which env is to be reset/stepped over. We need to decide if this key will be a bool or long tensor, which name it'll have, whether it'll be private or not. For sure users will need to be able to mask the data so we'll need to provide a mask indicating what data is valid.
Alternatives
Eventually we could also index batched envs directly but for now this is a long stretch.
Cc @albertbou92
Where would the index key be set? in a Transform? or the environment itself would do it?
I think the key could be a bool. named padded
step or skipped
step maybe?