DeepLearningFlappyBird icon indicating copy to clipboard operation
DeepLearningFlappyBird copied to clipboard

question on freezing target nework

Open hashbangCoder opened this issue 8 years ago • 8 comments

Hi @yenchenlin1994 , love your implementation! I went through your code and I can't seem to find where you've frozen the target network? Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch? Wouldn't that hurt your convergence rate badly?

hashbangCoder avatar Apr 20 '16 09:04 hashbangCoder

Hello, Yeah you are right. Actually I got a reimplemented version. Will submit soon! On Wed, Apr 20, 2016 at 17:46 Code-Deep-Blue [email protected] wrote:

Hi @yenchenlin1994 https://github.com/yenchenlin1994 , love your implementation! I went through your code and I can't seem to find where you've frozen the target network? Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch? Wouldn't that hurt your convergence rate badly?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/yenchenlin1994/DeepLearningFlappyBird/issues/15

yenchenlin avatar Apr 20 '16 10:04 yenchenlin

Hi again, i'm trying to reproduce the results on keras and have trained for ~400,000 steps and the bird is unable to cross the first pipe consistently. My loss is low though (~ 0.2) and Q-values are in the range of [0,8]. How long did it take for you before it actually started working i.e. cross the first pipe consistently?

hashbangCoder avatar May 10 '16 02:05 hashbangCoder

I can't remember the exactly number of iterations, but it's no more than ~1000,000 steps

yenchenlin avatar May 10 '16 06:05 yenchenlin

Still cannot find freezing target network in current version's code. It's really no effect?

xiahouzuoxin avatar May 27 '17 07:05 xiahouzuoxin

@hashbangCoder I meet the same question that the silly bird keeps top of the screen.....Did you fix it?

zsy372901 avatar Sep 19 '17 19:09 zsy372901

I also couldn't find freezing target network code. But thanks for your code. It's helpful for me.

weijinsong avatar Dec 08 '17 02:12 weijinsong

I write a version base on this repo with freezing target network.FlappyBird_DQN_with_target_network

initial-h avatar Jun 05 '18 03:06 initial-h

Here is another repo with target network. https://github.com/patrick-12sigma/DRL_FlappyBird

I made target network an option. You can turn it on and off and experiment to see how much it affects the convergence of training.

I refactored the network into a class, and added some logging functionalities to track the training process. I also borrowed the human play function from @initial-h. Thanks!

patrick-llgc avatar Jan 29 '19 03:01 patrick-llgc