Antonin RAFFIN
Antonin RAFFIN
Hello, best is to start with a working example: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/75b2de139927da26d5871aef9fd839632f73b296/sb3_contrib/common/envs/invalid_actions_env.py#L39 that being said, there might be a bug too. Tagging @kronion and @vwxyzjn as they actually worked with it.
> BUT: the shape of the mask is not (360,) or (1,360) but instead it is (128, 360) this actually looks good to me, we need to retrieve one mask...
Hello, thanks for the proposal. >Create a new learning algorithm. Basically DQN with two new hyperparameters. One to decide on the loss type (standard, Clipped, Reg), and the other for...
One last remark: do we really need the two? It looks like DQNReg is both simpler and working as good (or better) on all the tasks, no?
>Looking at Table 1 of the paper it seems RoadRunner and Asteroid are good environments to test out the DQNReg. Results with these two environments should suffice for Atari experiments....
> ran all the tests in the tests folder and am having tests fail because the observation_space is None and fails a check (see below). I think what's happening is...
Hello, it looks interesting, do you have a minimal code example on how it works? the synchronization is done after each step? In fact, I have the plan to have...
> Yes, the examples I made for testing live here. thanks =) but I'm afraid this won't work with the current SB3 implementation...
Hello, Your issue is the same as @vwxyzjn in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/25#issuecomment-922592839 : you can use `VecEnv` but the action masker must be on each single env for now, not on the...
Hello, you should take a look at https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/74 and https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/75b2de139927da26d5871aef9fd839632f73b296/sb3_contrib/common/envs/invalid_actions_env.py#L39, I think the action mask is supposed to be flattened (@kronion if it's not in the doc, we should update...