stable-baselines3
stable-baselines3 copied to clipboard
[Question] How do I correctly manually reset the episode on a `rollout_end`?
❓ Question
Hello,
I am modifying an environment on selected training milestones, on the end of rollouts. After these modifications I want any episode cut short when the rollout ended to be flushed, and the next rollout to begin with a fresh new episode and a resetted environment.
I'm assuming I just need to modify the self._last_xxx variables of the model, since the rollout collection will start anew.
Is that correct? Will the following callback func suffice?
def on_rollout_end(self):
self.model._last_obs = self.training_env.reset()
# replicate initializations from BaseAlgorithm._setup_learn
self.model._last_episode_starts = np.ones((self.training_env.num_envs,), dtype=bool)
if self.model._vec_normalize_env is not None:
self.model._last_original_obs = self._vec_normalize_env.get_original_obs()
Do you have any recommendations? Thanks!
Checklist
- [X] I have checked that there is no similar issue in the repo
- [X] I have read the documentation
- [X] If code there is, it is minimal and working
- [X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.
In your environment, is the number of timespteps per episode fixed?
In your environment, is the number of timespteps per episode fixed?
It is not. There is a max number of iterations per episode, but termination is conditional.