stable-baselines3 [Question] How do I correctly manually reset the episode on a `rollout

[Question] How do I correctly manually reset the episode on a `rollout_end`?

Open npit opened this issue 1 year ago • 2 comments

❓ Question

Hello,

I am modifying an environment on selected training milestones, on the end of rollouts. After these modifications I want any episode cut short when the rollout ended to be flushed, and the next rollout to begin with a fresh new episode and a resetted environment.

I'm assuming I just need to modify the self._last_xxx variables of the model, since the rollout collection will start anew. Is that correct? Will the following callback func suffice?

    def on_rollout_end(self):
        self.model._last_obs = self.training_env.reset()
        # replicate initializations from BaseAlgorithm._setup_learn
        self.model._last_episode_starts = np.ones((self.training_env.num_envs,), dtype=bool)
        if self.model._vec_normalize_env is not None:
            self.model._last_original_obs = self._vec_normalize_env.get_original_obs()

Do you have any recommendations? Thanks!

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

Jul 04 '24 22:07 npit

In your environment, is the number of timespteps per episode fixed?

Jul 06 '24 07:07 qgallouedec

In your environment, is the number of timespteps per episode fixed?

It is not. There is a max number of iterations per episode, but termination is conditional.

Jul 08 '24 08:07 npit

stable-baselines3 stable-baselines3 copied to clipboard

[Question] How do I correctly manually reset the episode on a `rollout_end`?

❓ Question

Checklist

stable-baselines3
stable-baselines3 copied to clipboard