Antonin RAFFIN

Results 880 comments of Antonin RAFFIN

Hello, thanks for pointing that out. Might be a bug. I need to dig deeper, this code is overly complex even with all the comments 🙈

To refresh my memory, I've created some graphics (I should probably put them in SB3 doc later). First, we collect `n_steps * n_envs` transitions and then flatten it to be...

> To be more clear, why the line 240 in common.recurrent.buffers is like "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds])" instead of "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds] or env_change[batch_inds])"? After looking at it, it should probably be like: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/00a401db2c0bcfe8410fba2c4df1d001909e59e3/sb3_contrib/common/recurrent/buffers.py#L82 We...

fyi, I created a fix some weeks ago: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/290 but I didn't have much time to benchmark it (so far, I couldn't see any significant changes), so if someone has...

Hello, > I also tried to profile my code with [py-spy](https://github.com/benfred/py-spy), and I found that MaskablePPO spent many extra time in [these lines](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/common/maskable/policies.py#L347-L349) at least the slow down is where...

> Is there a more SB3 native way for using a custom exploration schedule with additional custom arguments? for additional args, you will need to subclass DQN.

Hello, the env checker should be updated, SB3 doesn't support `spaces.Graph`. Duplicate of https://github.com/DLR-RM/stable-baselines3/issues/1723 and others (like https://github.com/DLR-RM/stable-baselines3/issues/1280)

Hello, what do you mean exactly by single sample? Maybe you mean what we do for PPO? https://github.com/DLR-RM/stable-baselines3/blob/656de97269c9e3051d3bbfb3f5f328d486867bd8/stable_baselines3/common/buffers.py#L63-L67

I'll try to have a look at it later, in the meantime, I've added some graphics here: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/284#issuecomment-2766526133

`assert len(set(torch.unique(sample.lstm_states.pi[0]))) == 1` checks that all samples from a minibatch comes from the same env. If you look at the graphic in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/284#issuecomment-2766526133, this is intentionally not the case....