agents
agents copied to clipboard
dqn_agent.DqnAgent: epsilon_greedy not modifiable during training
It seems it is not possible to define a variable decay epsilon during training for dqn_agent.DqnAgent. I have some direct and indirect evidence of this:
- I tried to define a custom decay() function and to link the epsilon_greedy to this function at the moment of the agent instantiation; printing the epsilon_greedy value from time to time during the training I can see that it remains at the starting value 1 even if the decay() as if the epsilon_greedy parameter was not callable
- therefore, I forced to change the epsilong_greedy value externally during the training process with agent._epsilon_greedy = decay(step); printing the agent._epsilon_greedy I can see that the parameter is really changing but the behavior is again as if the policy was completely random keeping the initial value 1
- as counter verification, I used the development of point 1 starting the decay function from 0.011 and ending with a constant epsilon equal to 0.01. In this case, I have the same behavior of training that I get when I set a constant epsilon = 0.01 i.e. it is keeping the initial 0.011
The conclusion I get is that the randomosity of the policy cannot be changed/updated during the training by changing the hyperparameter _epsilon_greedy nor it can be changed by linking the hyperparameter to a custom decay function. Is it true? In case, how can be defined a variable decay epsilon?