Keras-FlappyBird
Keras-FlappyBird copied to clipboard
initial bias towards action=1?
why does the network start with such a strong bias towards trying action 1 every timestep?
i only occasionally see action=0.
i looks like it would be difficult to break out of this pattern since it receives reward = 0.1 for it before encountering the first pipe-gate..
During the initial stage of the training, the agent is simply perform random exploration...The network should able to learn "don't flap too much" after training a while..
@yanpanlau if the initial exploration is random, i would expect both actions to be equally likely initially? that isn't the case.
How long is "a while"? I trained on a 1080TI overnight and it didn't improve at all, or if it did it was not noticeable. It seemed to be performing almost the same actions from when it started training. The model seems to always crash at the very top of the first pipe. I tried training it from scratch but it didn't help. I tried messing around with the epsilon value but it didn't make much of a difference. Anyone else have this issue?
I just re-test it and it should converge after like 100,000 steps. Can you try with the latest code?
Same issue here as @wobeert described. It didn't change even after 622000 steps. See below
I fixed it! I introduced a bug by mistake to the original code when I was creating multigpu version for gpu keras. Thanks!