typewriter
typewriter copied to clipboard
NoisyNetDense noise added to evaluations
I was watching some strange behaviors when evaluating a trained environment. The environment is deterministic and has a 'death switch' whenever a certain amount of actions had happened, and I was watching that evaluations of the same sequence would get me different results on each execution.
01111111111111101111111111111110112[1111]222111111111111111111111111111111
01111111111111101111111111111110112[1121]222111111111111111111111111111111
The proper behavior should be to not apply the exploration noise whenever you are not training.
In my case, this is particularly bad, because it introduces 'exploration' on the boundary of the classes and causes unexpected behaviors when you reach those. While the training tries to minimize those context switches.