Keras-FlappyBird initial bias towards action=1?

initial bias towards action=1?

Open no-zzz-un opened this issue 8 years ago • 6 comments

why does the network start with such a strong bias towards trying action 1 every timestep?

i only occasionally see action=0.

i looks like it would be difficult to break out of this pattern since it receives reward = 0.1 for it before encountering the first pipe-gate..

Nov 10 '16 17:11 no-zzz-un

During the initial stage of the training, the agent is simply perform random exploration...The network should able to learn "don't flap too much" after training a while..

Mar 12 '17 12:03 yanpanlau

@yanpanlau if the initial exploration is random, i would expect both actions to be equally likely initially? that isn't the case.

Mar 12 '17 12:03 no-zzz-un

How long is "a while"? I trained on a 1080TI overnight and it didn't improve at all, or if it did it was not noticeable. It seemed to be performing almost the same actions from when it started training. The model seems to always crash at the very top of the first pipe. I tried training it from scratch but it didn't help. I tried messing around with the epsilon value but it didn't make much of a difference. Anyone else have this issue?

Jan 26 '18 00:01 wobeert

I just re-test it and it should converge after like 100,000 steps. Can you try with the latest code?

Apr 04 '18 09:04 yanpanlau

Same issue here as @wobeert described. It didn't change even after 622000 steps. See below

Jul 01 '18 14:07 AloshkaD

I fixed it! I introduced a bug by mistake to the original code when I was creating multigpu version for gpu keras. Thanks!

Jul 01 '18 18:07 AloshkaD

Keras-FlappyBird Keras-FlappyBird copied to clipboard

initial bias towards action=1?

Keras-FlappyBird
Keras-FlappyBird copied to clipboard