DeepLearningFlappyBird
DeepLearningFlappyBird copied to clipboard
question on freezing target nework
Hi @yenchenlin1994 , love your implementation! I went through your code and I can't seem to find where you've frozen the target network? Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch? Wouldn't that hurt your convergence rate badly?
Hello, Yeah you are right. Actually I got a reimplemented version. Will submit soon! On Wed, Apr 20, 2016 at 17:46 Code-Deep-Blue [email protected] wrote:
Hi @yenchenlin1994 https://github.com/yenchenlin1994 , love your implementation! I went through your code and I can't seem to find where you've frozen the target network? Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch? Wouldn't that hurt your convergence rate badly?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/yenchenlin1994/DeepLearningFlappyBird/issues/15
Hi again, i'm trying to reproduce the results on keras and have trained for ~400,000 steps and the bird is unable to cross the first pipe consistently. My loss is low though (~ 0.2) and Q-values are in the range of [0,8]. How long did it take for you before it actually started working i.e. cross the first pipe consistently?
I can't remember the exactly number of iterations, but it's no more than ~1000,000 steps
Still cannot find freezing target network in current version's code. It's really no effect?
@hashbangCoder I meet the same question that the silly bird keeps top of the screen.....Did you fix it?
I also couldn't find freezing target network code. But thanks for your code. It's helpful for me.
I write a version base on this repo with freezing target network.FlappyBird_DQN_with_target_network
Here is another repo with target network. https://github.com/patrick-12sigma/DRL_FlappyBird
I made target network an option. You can turn it on and off and experiment to see how much it affects the convergence of training.
I refactored the network into a class, and added some logging functionalities to track the training process. I also borrowed the human play function from @initial-h. Thanks!