typewriter icon indicating copy to clipboard operation
typewriter copied to clipboard

NoisyNetDense noise added to evaluations

Open redknightlois opened this issue 4 years ago • 0 comments

I was watching some strange behaviors when evaluating a trained environment. The environment is deterministic and has a 'death switch' whenever a certain amount of actions had happened, and I was watching that evaluations of the same sequence would get me different results on each execution.

01111111111111101111111111111110112[1111]222111111111111111111111111111111
01111111111111101111111111111110112[1121]222111111111111111111111111111111

The proper behavior should be to not apply the exploration noise whenever you are not training.

In my case, this is particularly bad, because it introduces 'exploration' on the boundary of the classes and causes unexpected behaviors when you reach those. While the training tries to minimize those context switches.

redknightlois avatar Jul 05 '19 17:07 redknightlois