2048_env icon indicating copy to clipboard operation
2048_env copied to clipboard

It seems not a standard ddqn implementation.

Open mythsman opened this issue 2 years ago • 3 comments

https://github.com/YangRui2015/2048_env/blob/2e9b3938492e4f7a0a2b627b8607ad1c203d273a/dqn_agent.py#L96

I was new to DQN , and confused with this line . And it should be q_eval_4_next_state_argmax instead of q_eval_4_this_state_argmax , right ?

mythsman avatar May 25 '22 14:05 mythsman

https://github.com/YangRui2015/2048_env/blob/2e9b3938492e4f7a0a2b627b8607ad1c203d273a/dqn_agent.py#L96

I was new to DQN , and confused with this line . And it should be q_eval_4_next_state_argmax instead of q_eval_4_this_state_argmax , right ?

Hi, this is 'Double DQN'. Please refer to the original paper for more details.

YangRui2015 avatar May 25 '22 15:05 YangRui2015

Thanks for your reply . But according to the DDQN algorithm from ICML 2016 , I think the argmax should be evaluated with the next state on the older network instead of the current state. While the evaluation you choose is from the current state q_eval_total = self.eval_net(batch_state)

image

mythsman avatar May 25 '22 16:05 mythsman

Thanks for your reply . But according to the DDQN algorithm from ICML 2016 , I think the argmax should be evaluated with the next state on the older network instead of the current state. While the evaluation you choose is from the current state q_eval_total = self.eval_net(batch_state)

image

I think you are right. I will check and revise the code later. You can use the DQN method without the 'Double' trick. Thanks!

YangRui2015 avatar May 26 '22 02:05 YangRui2015