stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

[Question] Noise annealing / scheduling

Open Gregwar opened this issue 2 years ago • 5 comments

Question

Just like we have an exploration schedule in DQN, do we have an option to anneal /schedule action noises for continuous environments?

Additional context

I know that this was done in the former keras-rl project (https://github.com/keras-rl/keras-rl/blob/master/rl/random.py#L10) (I know as well that that would be yet-another hyperparameter)

Checklist

  • [X] I have read the documentation (required)
  • [X] I have checked that there is no similar issue in the repo (required)

Gregwar avatar Apr 01 '22 08:04 Gregwar

Hello,

do we have an option to anneal /schedule action noises for continuous environments?

out of the box no, but you can define a schedule for it, for instance, what was done in the old version of the zoo: https://github.com/araffin/rl-baselines-zoo/blob/master/utils/noise.py#L6 and pass that object to the algorithm action_noise parameter.

You could also use a callback that has access to self.model.action_noise which is probably the easiest option ;) (because it has access to self.num_timesteps too)

Available action noise classes are in https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/noise.py

araffin avatar Apr 01 '22 09:04 araffin

Maybe we could adapt the code to accept callables so that the lin_0.1 syntax would work ?

Gregwar avatar Apr 07 '22 13:04 Gregwar

Maybe we could adapt the code to accept callables so that the lin_0.1 syntax would work ?

You mean lin_0.1 in the RL Zoo?

You could also use a callback that has access to self.model.action_noise which is probably the easiest option ;) (because it has access to self.num_timesteps too)

because of that, I would rather implement it as a callback in the RL Zoo. Otherwise, you need to pass a max_timesteps as input to the action noise too and pass remaining_num_timesteps to the callable.

araffin avatar Apr 11 '22 15:04 araffin

You mean lin_0.1 in the RL Zoo?

Yes.

because of that, I would rather implement it as a callback in the RL Zoo. Otherwise, you need to pass a max_timesteps as input to the action noise too and pass remaining_num_timesteps to the callable.

If classes like NormalActionNoise would support a self._sigma as callable instead of scalar value it could directly be a schedule in the zoo, isn't it? Is it what you propose?

We could make a callback that change the noise properties as well, but it would mean implementing this as a separate way when there is already a pattern for this.

Gregwar avatar Apr 11 '22 16:04 Gregwar

as callable instead of scalar value it could directly be a schedule in the zoo, isn't it? Is it what you propose?

not really, I would rather define a callback that schedules this because the noise object doesn't have access to much (which is good, let's keep it simple and isolated) but for noise schedule, you may do it depending on progress remaining, agent performance, ... and this is what a callback has already access.

araffin avatar May 31 '22 15:05 araffin