agents icon indicating copy to clipboard operation
agents copied to clipboard

dqn_agent.DqnAgent: epsilon_greedy not modifiable during training

Open fede72bari opened this issue 1 year ago • 0 comments

It seems it is not possible to define a variable decay epsilon during training for dqn_agent.DqnAgent. I have some direct and indirect evidence of this:

  1. I tried to define a custom decay() function and to link the epsilon_greedy to this function at the moment of the agent instantiation; printing the epsilon_greedy value from time to time during the training I can see that it remains at the starting value 1 even if the decay() as if the epsilon_greedy parameter was not callable
  2. therefore, I forced to change the epsilong_greedy value externally during the training process with agent._epsilon_greedy = decay(step); printing the agent._epsilon_greedy I can see that the parameter is really changing but the behavior is again as if the policy was completely random keeping the initial value 1
  3. as counter verification, I used the development of point 1 starting the decay function from 0.011 and ending with a constant epsilon equal to 0.01. In this case, I have the same behavior of training that I get when I set a constant epsilon = 0.01 i.e. it is keeping the initial 0.011

The conclusion I get is that the randomosity of the policy cannot be changed/updated during the training by changing the hyperparameter _epsilon_greedy nor it can be changed by linking the hyperparameter to a custom decay function. Is it true? In case, how can be defined a variable decay epsilon?

fede72bari avatar Mar 08 '23 08:03 fede72bari