[Feature Request] Simplify the logic of sampling from rollout_buffer

Open G1NO3 opened this issue 5 months ago • 1 comments

🚀 Feature

In sb3_contrib/common/recurrent/buffers.py/RecurrentRolloutBuffer()._get_samples(), I think you adopt a rather weird way to get the samples from the rollout_buffer. You split one fixed-length training sequence (len=batch_size) into two variable data, and then pad them so that both of them could provided simultaneously to the LSTM. The padding may introduce some extra non-sense noise, and make the debugging quite hard. I'm working on designing a new LSTM but spend a whole night in understanding the logic of sampling in your code. I have a more intuitive and popular way to do this: just randomly pick some fixed-length sequences from the rollout buffer, and don't split any sampled sequence more.

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

[x] I have checked that there is no similar issue in the repo
[x] If I'm requesting a new feature, I have proposed alternatives

Jul 22 '25 16:07 G1NO3

related: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/284#issuecomment-2766526133

Jul 22 '25 17:07 araffin