[Question] Not updating lstm states during training

Open abhinavj98 opened this issue 1 year ago • 1 comments

❓ Question

In training PPO-Recurrent over different epochs we do not update the LSTM states even though the LSTM weights get updated. Is there a reason to do so? Or is it just to save compute and does not effect the optimization process a lot?

https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/ppo_recurrent/ppo_recurrent.py#L345-L349

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

Nov 21 '24 07:11 abhinavj98

Is there a reason to do so?

Simplicity.

Or is it just to save compute and does not effect the optimization process a lot?

yes.

They are mostly used to get a better initialization of the hidden state of the LSTM. (and also, the updated LSTM should not be too far in parameter space to the old LSTM used to collect the data)

Nov 27 '24 09:11 araffin