stable-baselines3
stable-baselines3 copied to clipboard
[Question] Noise annealing / scheduling
Question
Just like we have an exploration schedule in DQN, do we have an option to anneal /schedule action noises for continuous environments?
Additional context
I know that this was done in the former keras-rl project (https://github.com/keras-rl/keras-rl/blob/master/rl/random.py#L10) (I know as well that that would be yet-another hyperparameter)
Checklist
- [X] I have read the documentation (required)
- [X] I have checked that there is no similar issue in the repo (required)
Hello,
do we have an option to anneal /schedule action noises for continuous environments?
out of the box no, but you can define a schedule for it, for instance, what was done in the old version of the zoo: https://github.com/araffin/rl-baselines-zoo/blob/master/utils/noise.py#L6
and pass that object to the algorithm action_noise
parameter.
You could also use a callback that has access to self.model.action_noise
which is probably the easiest option ;) (because it has access to self.num_timesteps
too)
Available action noise classes are in https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/noise.py
Maybe we could adapt the code to accept callables so that the lin_0.1
syntax would work ?
Maybe we could adapt the code to accept callables so that the lin_0.1 syntax would work ?
You mean lin_0.1
in the RL Zoo?
You could also use a callback that has access to self.model.action_noise which is probably the easiest option ;) (because it has access to self.num_timesteps too)
because of that, I would rather implement it as a callback in the RL Zoo.
Otherwise, you need to pass a max_timesteps
as input to the action noise too and pass remaining_num_timesteps
to the callable.
You mean
lin_0.1
in the RL Zoo?
Yes.
because of that, I would rather implement it as a callback in the RL Zoo. Otherwise, you need to pass a max_timesteps as input to the action noise too and pass remaining_num_timesteps to the callable.
If classes like NormalActionNoise
would support a self._sigma
as callable instead of scalar value it could directly be a schedule in the zoo, isn't it? Is it what you propose?
We could make a callback that change the noise properties as well, but it would mean implementing this as a separate way when there is already a pattern for this.
as callable instead of scalar value it could directly be a schedule in the zoo, isn't it? Is it what you propose?
not really, I would rather define a callback that schedules this because the noise object doesn't have access to much (which is good, let's keep it simple and isolated) but for noise schedule, you may do it depending on progress remaining, agent performance, ... and this is what a callback has already access.