stable-baselines3-contrib icon indicating copy to clipboard operation
stable-baselines3-contrib copied to clipboard

[Question] Not updating lstm states during training

Open abhinavj98 opened this issue 1 year ago • 1 comments

❓ Question

In training PPO-Recurrent over different epochs we do not update the LSTM states even though the LSTM weights get updated. Is there a reason to do so? Or is it just to save compute and does not effect the optimization process a lot?

https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/ppo_recurrent/ppo_recurrent.py#L345-L349

Checklist

abhinavj98 avatar Nov 21 '24 07:11 abhinavj98

Is there a reason to do so?

Simplicity.

Or is it just to save compute and does not effect the optimization process a lot?

yes.

They are mostly used to get a better initialization of the hidden state of the LSTM. (and also, the updated LSTM should not be too far in parameter space to the old LSTM used to collect the data)

araffin avatar Nov 27 '24 09:11 araffin