REnforce icon indicating copy to clipboard operation
REnforce copied to clipboard

The bandit tests are flaky

Open NivenT opened this issue 7 years ago • 0 comments

As a bare minimum for thinking a new RL algorithm was possible implemented correctly, it is given a test on the N-armed bandit problem. This environment is about as simple as RL environments get, and so every algorithm should be able to "solve" it w/o problem. This is currently not the case, as some environments (I think just CrossEntropy) do not consistently pass. More care needs to be taken in choosing hyperparameters here so tests aren't flaky.

NivenT avatar Aug 03 '17 08:08 NivenT