Deep-Q-Learning Request help on Double DQN

Request help on Double DQN

Open Remember2018 opened this issue 5 years ago • 0 comments

Hi, thanks a lot for your great work!

I have a question, in the Double DQN, maybe the following code needs a stop_gradient?

target_q = rewards + (gamma*double_q * (1-terminal_flags))

The double_q is from the target DQN. And when updating the main DQN, the error will back propagated to the target DQN if we don't stop the flow, right? So do we need to stop the gradient as follows?

target_q = tf.stop_gradient(target_q)

Could you please give some advice? Thanks.

Nov 04 '19 23:11 Remember2018

Deep-Q-Learning Deep-Q-Learning copied to clipboard

Request help on Double DQN

Deep-Q-Learning
Deep-Q-Learning copied to clipboard