[Feature Request] Simplify the logic of sampling from rollout_buffer
🚀 Feature
In sb3_contrib/common/recurrent/buffers.py/RecurrentRolloutBuffer()._get_samples(), I think you adopt a rather weird way to get the samples from the rollout_buffer. You split one fixed-length training sequence (len=batch_size) into two variable data, and then pad them so that both of them could provided simultaneously to the LSTM. The padding may introduce some extra non-sense noise, and make the debugging quite hard. I'm working on designing a new LSTM but spend a whole night in understanding the logic of sampling in your code. I have a more intuitive and popular way to do this: just randomly pick some fixed-length sequences from the rollout buffer, and don't split any sampled sequence more.
Motivation
No response
Pitch
No response
Alternatives
No response
Additional context
No response
Checklist
- [x] I have checked that there is no similar issue in the repo
- [x] If I'm requesting a new feature, I have proposed alternatives
related: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/284#issuecomment-2766526133