[Question] Not updating lstm states during training
❓ Question
In training PPO-Recurrent over different epochs we do not update the LSTM states even though the LSTM weights get updated. Is there a reason to do so? Or is it just to save compute and does not effect the optimization process a lot?
https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/ppo_recurrent/ppo_recurrent.py#L345-L349
Checklist
- [X] I have checked that there is no similar issue in the repo
- [X] I have read the documentation
- [X] If code there is, it is minimal and working
- [X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.
Is there a reason to do so?
Simplicity.
Or is it just to save compute and does not effect the optimization process a lot?
yes.
They are mostly used to get a better initialization of the hidden state of the LSTM. (and also, the updated LSTM should not be too far in parameter space to the old LSTM used to collect the data)